Age | Commit message (Collapse) | Author | Files | Lines |
|
When running test-case gdb.threads/manythreads.exp with check-read1, I ran
into this hard-to-reproduce FAIL:
...
[New Thread 0x7ffff7318700 (LWP 31125)]^M
[Thread 0x7ffff7321700 (LWP 31124) exited]^M
[New T^C^M
^M
Thread 769 "manythreads" received signal SIGINT, Interrupt.^M
[Switching to Thread 0x7ffff6d66700 (LWP 31287)]^M
0x00007ffff7586a81 in clone () from /lib64/libc.so.6^M
(gdb) FAIL: gdb.threads/manythreads.exp: stop threads 1
...
The matching in the failing gdb_test_multiple is done in an intricate way,
trying to pass on some order and fail on another order.
Fix this by rewriting the regexps to match one line at most, and detecting
invalid order by setting and checking state variables.
Tested on x86_64-linux.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29177
|
|
The previous patch to add -prompt/-lbl to gdb_test introduced a
regression: Before, you could specify an explicit empty message to
indicate you didn't want to PASS, like so:
gdb_test COMMAND PATTERN ""
After said patch, gdb_test no longer distinguishes
no-message-specified vs empty-message, so tests that previously would
be silent on PASS, now started emitting PASS messages based on
COMMAND. This in turn introduced a number of PATH/DUPLICATE
violations in the testsuite.
This commit fixes all the regressions I could see.
This patch uses the new -nopass feature introduced in the previous
commit, but tries to avoid it if possible. Most of the patch fixes
DUPLICATE issues the usual way, of using with_test_prefix or explicit
unique messages.
See previous commit's log for more info.
In addition to looking for DUPLICATEs, I also looked for cases where
we would now end up with an empty message in gdb.sum, due to a
gdb_test being passed both no message and empty command. E.g., this
in gdb.ada/bp_reset.exp:
gdb_run_cmd
gdb_test "" "Breakpoint $decimal, foo\\.nested_sub \\(\\).*"
was resulting in this in gdb.sum:
PASS: gdb.ada/bp_reset.exp:
I fixed such cases by passing an explicit message. We may want to
make such cases error out.
Tested on x86_64 GNU/Linux, native and native-extended-gdbserver. I
see zero PATH cases now. I get zero DUPLICATEs with native testing
now. I still see some DUPLICATEs with native-extended-gdbserver, but
those were preexisting, unrelated to the gdb_test change.
Change-Id: I5375f23f073493e0672190a0ec2e847938a580b2
|
|
Currently GDB is not able to debug (Binary generated with Clang) variables
present in shared/private clause of OpenMP Task construct. Please note that
LLVM debugger LLDB is able to debug.
In case of OpenMP, compilers generate artificial functions which are not
present in actual program. This is done to apply parallelism to block of
code.
For non-artifical functions, DW_AT_name attribute should contains the name
exactly as present in actual program.
(Ref# http://wiki.dwarfstd.org/index.php?title=Best_Practices)
Since artificial functions are not present in actual program they not having
DW_AT_name and having DW_AT_linkage_name instead should be fine.
Currently GDB is invalidating any function not havnig DW_AT_name which is why
it is not able to debug OpenMP (Clang).
It should be fair to fallback to check DW_AT_linkage_name in case DW_AT_name
is absent.
|
|
Many test cases had a few lines in the beginning that look like:
if { condition } {
continue
}
Where conditions varied, but were mostly in the form of ![runto_main] or
[skip_*_tests], making it quite clear that this code block was supposed
to finish the test if it entered the code block. This generates TCL
errors, as most of these tests are not inside loops. All cases on which
this was an obvious mistake are changed in this patch.
|
|
When running test-case gdb.threads/fork-plus-threads.exp with check-readmore,
I run into:
...
[Inferior 11 (process 7029) exited normally]^M
[Inferior 1 (process 6956) exited normally]^M
FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: \
inferior 1 exited (timeout)
...
The problem is that the regexp consuming the "Inferior exited normally"
messages:
- consumes more than one of those messages at a time, but
- counts only one of those messages.
Fix this by adopting a line-by-line approach, which deals with those messages
one at a time.
Tested on x86_64-linux with native, check-read1 and check-readmore.
|
|
When testing gdb.threads/access-mem-running-thread-exit.exp with
--target_board=native-extended-gdbserver, we get:
Running gdb.threads/access-mem-running-thread-exit.exp ...
FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: second inferior: runto: run to main
WARNING: Timed out waiting for EOF in server after monitor exit
=== gdb Summary ===
# of expected passes 3
# of unexpected failures 1
# of unsupported tests 1
The problem is that the testcase spawns a second inferior with
-no-connection, and then runto_main does "run", which fails like so:
(gdb) run
Don't know how to run. Try "help target".
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: second inferior: runto: run to main
That "run" above failed because native-extended-gdbserver forces "set
auto-connect-native-target off", to prevent testcases from mistakenly
running programs with the native target, which would exactly be the
case here.
Fix this by letting the second inferior share the first inferior's
connection everywhere except on targets that do reload on run (e.g.,
--target_board=native-gdbserver).
Change-Id: Ib57105a238cbc69c57220e71261219fa55d329ed
|
|
failure of gdb.threads/fork-plus-threads.exp
This test sometimes fail like this:
info threads^M
Id Target Id Frame ^M
11.12 process 2270719 Couldn't get registers: No such process.^M
(gdb) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: no threads left
[Inferior 11 (process 2270719) exited normally]^M
info inferiors^M
Num Description Connection Executable ^M
* 1 <null> /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads ^M
11 <null> /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads ^M
(gdb) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left (the program exited)
I can get it to fail quite reliably by pinning it to a core:
$ taskset -c 5 make check TESTS="gdb.threads/fork-plus-threads.exp"
The previous attempt at fixing this was:
https://sourceware.org/pipermail/gdb-patches/2021-October/182846.html
What we see is part due to a possible unfortunate ordering of events
given by the kernel, and what could be considered a bug in GDB.
The test program makes a number of forks, waits them all, then exits.
Most of the time, GDB will get and process the exit event for inferior 1
after the exit events of all the children. But this is not guaranteed.
After the last child exits and is waited by the parent, the parent can
exit quickly, such that GDB collects from the kernel the exit events for
the parent and that child at the same time. It then chooses one event
at random, which can be the event for the parent. This will result in
the parent appearing to exit before its child. There's not much we can
do about it, so I think we have to adjust the test to cope.
After expect has seen the "exited normally" notification for inferior 1,
it immediately does an "info thread" that it expects to come back empty.
But at this point, GDB might not have processed inferior 11's (the last
child) exit event, so it will look like there is still a thread. Of
course that thread is dead, we just don't know it yet. But that makes
the "no thread" test fail. If the test waited just a bit more for the
"exited normally" notification for inferior 11, then the list of threads
would be empty.
So, first change, make the test collect all the "exited normally"
notifications for all inferiors before proceeding, that should ensure we
see an empty thread list. That would fix the first FAIL above.
However, we would still have the second FAIL, as we expect inferior 11
to not be there, it should have been deleted automatically. Inferior 11
is normally deleted when prune_inferiors is called. That is called by
normal_stop, which is only called by fetch_inferior_event only if the
event thread completed an execution command FSM (thread_fsm). But the
FSM for the continue command completed when inferior 1 exited. At that
point inferior 11 was not prunable, as it still had a thread. When
inferior 11 exits, prune_inferiors is not called.
I think that can be considered a GDB bug. From the user point of view,
there's no reason why in one case inferior 11 would be deleted and not
in the other case.
This patch makes the somewhat naive change to call prune_inferiors in
fetch_inferior_event, so that it is called in this case. It is placed
at this particular point in the function so that it is called after the
user inferior / thread selection is restored. If it was called before
that, inferior 11 wouldn't be pruned, because it would still be the
current inferior.
Change-Id: I48a15d118f30b1c72c528a9f805ed4974170484a
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=26272
|
|
LLVM's lld linker doesn't have the "-Ttext-segment" option, but
"--image-base" can be used instead.
To centralize the logic of checking which option is supported, add the
text_segment option to gdb_compile. Change tests that are currently
using -Ttext-segment to use that new option instead.
This patch fixes only compilation error, for example:
Before:
$ make check TESTS="gdb.base/jit-elf.exp" RUNTESTFLAGS="CC_FOR_TARGET=clang LDFLAGS_FOR_TARGET=-fuse-ld=ld"
Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/jit-elf.exp ...
gdb compile failed, clang-13: warning: -Xlinker -Ttext-segment=0x7000000: 'linker' input unused [-Wunused-command-line-argument]
After:
$ make check TESTS="gdb.base/jit-elf.exp" RUNTESTFLAGS="CC_FOR_TARGET=clang LDFLAGS_FOR_TARGET=-fuse-ld=ld"
Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/jit-elf.exp ...
FAIL: gdb.base/jit-elf.exp: one_jit_test-1: continue to breakpoint: break here 1
FAIL: gdb.base/jit-elf.exp: one_jit_test-1: continue to breakpoint: break here 2
FAIL: gdb.base/jit-elf.exp: one_jit_test-2: continue to breakpoint: break here 1
FAIL: gdb.base/jit-elf.exp: one_jit_test-2: info function ^jit_function
FAIL: gdb.base/jit-elf.exp: one_jit_test-2: continue to breakpoint: break here 2
FAIL: gdb.base/jit-elf.exp: attach: one_jit_test-2: continue to breakpoint: break here 1
FAIL: gdb.base/jit-elf.exp: attach: one_jit_test-2: break here 1: attach
FAIL: gdb.base/jit-elf.exp: PIE: one_jit_test-1: continue to breakpoint: break here 1
FAIL: gdb.base/jit-elf.exp: PIE: one_jit_test-1: continue to breakpoint: break here 2
=== gdb Summary ===
# of expected passes 26
# of unexpected failures 9
Change-Id: I3678c5c9bbfc2f80671698e28a038e6b3d14e635
|
|
The test introduced by this patch would fail in this configuration, with
the native-gdbserver or native-extended-gdbserver boards:
FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=auto: i=2: next to for loop
The problem is that the step operation is forgotten when handling the
fork/vfork. With "debug infrun" and "debug remote", it looks like this
(some lines omitted for brevity). We do the next:
[infrun] proceed: enter
[infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
[infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154304.0] at 0x5555555553bf
[infrun] do_target_resume: resume_ptid=4154304.0.0, step=1, sig=GDB_SIGNAL_0
[remote] Sending packet: $vCont;r5555555553bf,5555555553c4:p3f63c0.3f63c0;c:p3f63c0.-1#cd
[infrun] proceed: exit
We then handle a fork event:
[infrun] fetch_inferior_event: enter
[remote] wait: enter
[remote] Packet received: T05fork:p3f63ee.3f63ee;06:0100000000000000;07:b08e59f6ff7f0000;10:bf60e8f7ff7f0000;thread:p3f63c0.3f63c6;core:17;
[remote] wait: exit
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: 4154304.4154310.0 [Thread 4154304.4154310],
[infrun] print_target_wait_results: status->kind = FORKED, child_ptid = 4154350.4154350.0
[infrun] handle_inferior_event: status->kind = FORKED, child_ptid = 4154350.4154350.0
[remote] Sending packet: $D;3f63ee#4b
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154310.0] at 0x7ffff7e860bf
[infrun] do_target_resume: resume_ptid=4154304.0.0, step=0, sig=GDB_SIGNAL_0
[remote] Sending packet: $vCont;c:p3f63c0.-1#73
[infrun] fetch_inferior_event: exit
In the first snippet, we resume the stepping thread with the range-stepping (r)
vCont command. But after handling the fork (detaching the fork child), we
resumed the whole process freely. The stepping thread, which was paused by
GDBserver while reporting the fork event, was therefore resumed freely, instead
of confined to the addresses of the stepped line. Note that since this
is a "next", it could be that we have entered a function, installed a
step-resume breakpoint, and it's ok to continue freely the stepping
thread, but that's not the case here. The two snippets shown above were
next to each other in the logs.
For the fork case, we can resume stepping right after handling the
event.
However, for the vfork case, where we are waiting for the
external child process to exec or exit, we only resume the thread that
called vfork, and keep the others stopped (see patch "gdb: fix handling of
vfork by multi-threaded program" prior in this series). So we can't
resume the stepping thread right now. Instead, do it after handling the
vfork-done event.
Change-Id: I92539c970397ce880110e039fe92b87480f816bd
|
|
(follow-fork-mode=parent, detach-on-fork=on)
There is a problem with how GDB handles a vfork happening in a
multi-threaded program. This problem was reported to me by somebody not
using vfork directly, but using system(3) in a multi-threaded program,
which may be implemented using vfork.
This patch only deals about the follow-fork-mode=parent,
detach-on-fork=on case, because it would be too much to chew at once to
fix the bugs in the other cases as well (I tried).
The problem
-----------
When a program vforks, the parent thread is suspended by the kernel
until the child process exits or execs. Specifically, in a
multi-threaded program, only the thread that called vfork is suspended,
other threads keep running freely. This is documented in the vfork(2)
man page ("Caveats" section).
Let's suppose GDB is handling a vfork and the user's desire is to detach
from the child. Before detaching the child, GDB must remove the software
breakpoints inserted in the shared parent/child address space, in case
there's a breakpoint in the path the child is going to take before
exec'ing or exit'ing (unlikely, but possible). Otherwise the child could
hit a breakpoint instruction while running outside the control of GDB,
which would make it crash. GDB must also avoid re-inserting breakpoints
in the parent as long as it didn't receive the "vfork done" event (that
is, when the child has exited or execed): since the address space is
shared with the child, that would re-insert breakpoints in the child
process also. So what GDB does is:
1. Receive "vfork" event for the parent
2. Remove breakpoints from the (shared) address space and set
program_space::breakpoints_not_allowed to avoid re-inserting them
3. Detach from the child thread
4. Resume the parent
5. Wait for and receive "vfork done" event for the parent
6. Clean program_space::breakpoints_not_allowed and re-insert
breakpoints
7. Resume the parent
Resuming the parent at step 4 is necessary in order for the kernel to
report the "vfork done" event. The kernel won't report a ptrace event
for a thread that is ptrace-stopped. But the theory behind this is that
between steps 4 and 5, the parent won't actually do any progress even
though it is ptrace-resumed, because the kernel keeps it suspended,
waiting for the child to exec or exit. So it doesn't matter for that
thread if breakpoints are not inserted.
The problem is when the program is multi-threaded. In step 4, GDB
resumes all threads of the parent. The thread that did the vfork stays
suspended by the kernel, so that's fine. But other threads are running
freely while breakpoints are removed, which is a problem because they
could miss a breakpoint that they should have hit.
The problem is present with all-stop and non-stop targets. The only
difference is that with an all-stop targets, the other threads are
stopped by the target when it reports the vfork event and are resumed by
the target when GDB resumes the parent. With a non-stop target, the
other threads are simply never stopped.
The fix
-------
There many combinations of settings to consider (all-stop/non-stop,
target-non-stop on/off, follow-fork-mode parent/child, detach-on-fork
on/off, schedule-multiple on/off), but for this patch I restrict the
scope to follow-fork-mode=parent, detach-on-fork=on. That's the
"default" case, where we detach the child and keep debugging the
parent. I tried to fix them all, but it's just too much to do at once.
The code paths and behaviors for when we don't detach the child are
completely different.
The guiding principle for this patch is that all threads of the vforking
inferior should be stopped as long as breakpoints are removed. This is
similar to handling in-line step-overs, in a way.
For non-stop targets (the default on Linux native), this is what
happens:
- In follow_fork, we call stop_all_threads to stop all threads of the
inferior
- In follow_fork_inferior, we record the vfork parent thread in
inferior::thread_waiting_for_vfork_done
- Back in handle_inferior_event, we call keep_going, which resumes only
the event thread (this is already the case, with a non-stop target).
This is the thread that will be waiting for vfork-done.
- When we get the vfork-done event, we go in the (new) handle_vfork_done
function to restart the previously stopped threads.
In the same scenario, but with an all-stop target:
- In follow_fork, no need to stop all threads of the inferior, the
target has stopped all threads of all its inferiors before returning
the event.
- In follow_fork_inferior, we record the vfork parent thread in
inferior::thread_waiting_for_vfork_done.
- Back in handle_inferior_event, we also call keep_going. However, we
only want to resume the event thread here, not all inferior threads.
In internal_resume_ptid (called by resume_1), we therefore now check
whether one of the inferiors we are about to resume has
thread_waiting_for_vfork_done set. If so, we only resume that
thread.
Note that when resuming multiple inferiors, one vforking and one not
non-vforking, we could resume the vforking thread from the vforking
inferior plus all threads from the non-vforking inferior. However,
this is not implemented, it would require more work.
- When we get the vfork-done event, the existing call to keep_going
naturally resumes all threads.
Testing-wise, add a test that tries to make the main thread hit a
breakpoint while a secondary thread calls vfork. Without the fix, the
main thread keeps going while breakpoints are removed, resulting in a
missed breakpoint and the program exiting.
Change-Id: I20eb78e17ca91f93c19c2b89a7e12c382ee814a1
|
|
linux_handle_extended_wait
The check removed by this patch, using current_inferior, looks wrong.
When debugging multiple inferiors with the Linux native target and
linux_handle_extended_wait is called, there's no guarantee about which
is the current inferior. The vfork-done event we receive could be for
any inferior. If the vfork-done event is for a non-current inferior, we
end up wrongfully ignoring it. As a result, the core never processes a
TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed
is never cleared, and breakpoints are never reinserted. However,
because the Linux native target decided to ignore the event, it resumed
the thread - while breakpoints out. And that's bad.
The proposed fix is to remove this check. Always report vfork-done
events and let infrun's logic decide if it should be ignored. We don't
save much cycles by filtering the event here.
Add a test that replicates the situation described above. See comments
in the test for more details.
Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
|
|
This adds a multi-threaded testcase that has all threads in the
process exit with a different exit code, and ensures that GDB reports
the thread group leader's exit status as the whole-process exit
status. Before this set of patches, this would randomly report the
exit code of some other thread, and thus fail.
Tested on Linux-x86_64, native and gdbserver.
Co-Authored-By: Pedro Alves <pedro@palves.net>
Change-Id: I30cba2ff4576fb01b5169cc72667f3268d919557
|
|
If we make GDB report the process EXIT event for the leader thread, as
will be done in a latter patch of this series, then
gdb.threads/current-lwp-dead.exp starts failing:
(gdb) break fn_return
Breakpoint 2 at 0x5555555551b5: file /home/pedro/rocm/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/current-lwp-dead.c, line 45.
(gdb) continue
Continuing.
[New LWP 2138466]
[Inferior 1 (process 2138459) exited normally]
(gdb) FAIL: gdb.threads/current-lwp-dead.exp: continue to breakpoint: fn_return (the program exited)
The inferior exit reported is actually correct. The main thread has
indeed exited, and that's the thread that has the right exit code to
report to the user, as that's the exit code that is reported to the
program's parent. In this case, GDB managed to collect the exit code
for the leader thread before reaping the other thread, because in
reality, the testcase isn't creating standard threads, it is using raw
clone, and the new clones are put in their own thread group.
Fix it by making the main "thread" not exit until the scenario we're
exercising plays out. Also, run the program to completion for
completeness.
The original program really wanted the leader thread to exit before
the fn_return function was reached -- it was important that the
current thread as pointed by inferior_ptid was gone when infrun got
the breakpoint event. I've tweaked the testcase to ensure that that
condition is still held, though it is no longer the main thread that
exits. This required a bit of synchronization between the threads,
which required using CLONE_VM unconditionally. The #ifdef guards were
added as a fix for
https://sourceware.org/bugzilla/show_bug.cgi?id=11214, though I don't
think they were necessary because the program is not using TLS. If it
turns out they were necessary, we can link the testcase with "-z now"
instead, which was mentioned as an alternative workaround in that
Bugzilla.
Change-Id: I7be2f0da4c2fe8f80a60bdde5e6c623d8bd5a0aa
|
|
If we make GDB report the process EXIT event for the leader thread,
instead of whatever is the last thread in the LWP list, as will be
done in a latter patch of this series, then
gdb.threads/current-lwp-dead.exp starts failing:
(gdb) FAIL: gdb.threads/clone-new-thread-event.exp: catch SIGUSR1 (the program exited)
This is a testcase race -- the main thread does not wait for the
spawned clone "thread" to finish before exiting, so the main program
may exit before the second thread is scheduled and reports its
SIGUSR1. With the change to make GDB report the EXIT for the leader,
the race is 100% reproducible by adding a sleep(), like so:
--- c/gdb/testsuite/gdb.threads/clone-new-thread-event.c
+++ w/gdb/testsuite/gdb.threads/clone-new-thread-event.c
@@ -51,6 +51,7 @@ local_gettid (void)
static int
fn (void *unused)
{
+ sleep (1);
tkill (local_gettid (), SIGUSR1);
return 0;
}
Resulting in:
Breakpoint 1, main (argc=1, argv=0x7fffffffd418) at gdb.threads/clone-new-thread-event.c:65
65 stack = malloc (STACK_SIZE);
(gdb) continue
Continuing.
[New LWP 3715562]
[Inferior 1 (process 3715555) exited normally]
(gdb) FAIL: gdb.threads/clone-new-thread-event.exp: catch SIGUSR1 (the program exited)
That inferior exit reported is actually correct. The main thread has
indeed exited, and that's the thread that has the right exit code to
report to the user, as that's the exit code that is reported to the
program's parent. In this case, GDB managed to collect the exit code
for the leader thread before reaping the other thread, because in
reality, the testcase isn't creating standard threads, it is using raw
clone, and the new clones are put in their own thread group.
Fix it by making the main thread wait for the child to exit. Also,
run the program to completion for completeness.
Change-Id: I315cd3dc2b9e860395dcab9658341ea868d7a6bf
|
|
Starting with commit
commit 1da5d0e664e362857153af8682321a89ebafb7f6
Date: Tue Jan 4 08:02:24 2022 -0700
Change how Python architecture and language are handled
we see a failure in gdb.threads/killed-outside.exp:
...
Executing on target: kill -9 16622 (timeout = 300)
builtin_spawn -ignore SIGHUP kill -9 16622
continue
Continuing.
Couldn't get registers: No such process.
(gdb) [Thread 0x7ffff77c2700 (LWP 16626) exited]
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
FAIL: gdb.threads/killed-outside.exp: prompt after first continue (timeout)
This is not a regression but a failure due to a change in GDB's
output. Prior to the aforementioned commit, GDB has been printing the
"Couldn't get registers: No such process." message twice. The second
one came from
(top-gdb) bt
#0 amd64_linux_nat_target::fetch_registers (this=0x555557f31440 <the_amd64_linux_nat_target>, regcache=0x555558805ce0, regnum=16) at /gdb-up/gdb/amd64-linux-nat.c:225
#1 0x000055555640ac5f in target_ops::fetch_registers (this=0x555557d636d0 <the_thread_db_target>, arg0=0x555558805ce0, arg1=16) at /gdb-up/gdb/target-delegates.c:502
#2 0x000055555641a647 in target_fetch_registers (regcache=0x555558805ce0, regno=16) at /gdb-up/gdb/target.c:3945
#3 0x0000555556278e68 in regcache::raw_update (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:587
#4 0x0000555556278f14 in readable_regcache::raw_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:601
#5 0x00005555562792aa in readable_regcache::cooked_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:690
#6 0x000055555627965e in readable_regcache::cooked_read_value (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:748
#7 0x0000555556352a37 in sentinel_frame_prev_register (this_frame=0x555558181090, this_prologue_cache=0x5555581810a8, regnum=16) at /gdb-up/gdb/sentinel-frame.c:53
#8 0x0000555555fa4773 in frame_unwind_register_value (next_frame=0x555558181090, regnum=16) at /gdb-up/gdb/frame.c:1235
#9 0x0000555555fa420d in frame_register_unwind (next_frame=0x555558181090, regnum=16, optimizedp=0x7fffffffd570, unavailablep=0x7fffffffd574, lvalp=0x7fffffffd57c, addrp=0x7fffffffd580,
realnump=0x7fffffffd578, bufferp=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1143
#10 0x0000555555fa455f in frame_unwind_register (next_frame=0x555558181090, regnum=16, buf=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1199
#11 0x00005555560178e2 in i386_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/i386-tdep.c:1972
#12 0x0000555555cd2b9d in gdbarch_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/gdbarch.c:3007
#13 0x0000555555fa3a5b in frame_unwind_pc (this_frame=0x555558181090) at /gdb-up/gdb/frame.c:948
#14 0x0000555555fa7621 in get_frame_pc (frame=0x555558181160) at /gdb-up/gdb/frame.c:2572
#15 0x0000555555fa7706 in get_frame_address_in_block (this_frame=0x555558181160) at /gdb-up/gdb/frame.c:2602
#16 0x0000555555fa77d0 in get_frame_address_in_block_if_available (this_frame=0x555558181160, pc=0x7fffffffd708) at /gdb-up/gdb/frame.c:2665
#17 0x0000555555fa5f8d in select_frame (fi=0x555558181160) at /gdb-up/gdb/frame.c:1890
#18 0x0000555555fa5bab in lookup_selected_frame (a_frame_id=..., frame_level=-1) at /gdb-up/gdb/frame.c:1720
#19 0x0000555555fa5e47 in get_selected_frame (message=0x0) at /gdb-up/gdb/frame.c:1810
#20 0x0000555555cc9c6e in get_current_arch () at /gdb-up/gdb/arch-utils.c:848
#21 0x000055555625b239 in gdbpy_before_prompt_hook (extlang=0x555557451f20 <extension_language_python>, current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ")
at /gdb-up/gdb/python/python.c:1063
#22 0x0000555555f7cfbb in ext_lang_before_prompt (current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/extension.c:922
#23 0x0000555555f7d442 in std::_Function_handler<void (char const*), void (*)(char const*)>::_M_invoke(std::_Any_data const&, char const*&&) (__functor=...,
__args#0=@0x7fffffffd900: 0x555557f4d890 <top_prompt+16> "(gdb) ") at /usr/include/c++/7/bits/std_function.h:316
#24 0x0000555555f752dd in std::function<void (char const*)>::operator()(char const*) const (this=0x55555817d838, __args#0=0x555557f4d890 <top_prompt+16> "(gdb) ")
at /usr/include/c++/7/bits/std_function.h:706
#25 0x0000555555f75100 in gdb::observers::observable<char const*>::notify (this=0x555557f49060 <gdb::observers::before_prompt>, args#0=0x555557f4d890 <top_prompt+16> "(gdb) ")
at /gdb-up/gdb/../gdbsupport/observable.h:150
#26 0x0000555555f736dc in top_level_prompt () at /gdb-up/gdb/event-top.c:444
#27 0x0000555555f735ba in display_gdb_prompt (new_prompt=0x0) at /gdb-up/gdb/event-top.c:411
#28 0x00005555564611a7 in tui_on_command_error () at /gdb-up/gdb/tui/tui-interp.c:205
#29 0x0000555555c2173f in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
#30 0x0000555555e10c20 in std::function<void ()>::operator()() const (this=0x5555580f9028) at /usr/include/c++/7/bits/std_function.h:706
#31 0x0000555555e10973 in gdb::observers::observable<>::notify() const (this=0x555557f48d20 <gdb::observers::command_error>) at /gdb-up/gdb/../gdbsupport/observable.h:150
#32 0x00005555560e9b3f in start_event_loop () at /gdb-up/gdb/main.c:438
#33 0x00005555560e9bcc in captured_command_loop () at /gdb-up/gdb/main.c:481
#34 0x00005555560eb616 in captured_main (data=0x7fffffffddd0) at /gdb-up/gdb/main.c:1348
#35 0x00005555560eb67c in gdb_main (args=0x7fffffffddd0) at /gdb-up/gdb/main.c:1363
#36 0x0000555555c1b6b3 in main (argc=12, argv=0x7fffffffded8) at /gdb-up/gdb/gdb.c:32
Commit 1da5d0e664 eliminated the call to 'get_current_arch'
in 'gdbpy_before_prompt_hook'. Hence, the second instance of
"Couldn't get registers: No such process." does not appear anymore.
Fix the failure by updating the regular expression in the test.
|
|
Rename 'set debug lin-lwp' to 'set debug linux-nat' and 'show debug
lin-lwp' to 'show debug linux-nat'.
I've updated the documentation and help text to match, as well as
making it clear that the debug that is coming out relates to all
aspects of Linux native inferior support, not just the LWP aspect of
it.
The boundary between general "native" target debug, and the lwp
specific part of that debug was always a little blurry, but the actual
debug variable inside GDB is debug_linux_nat, and the print routine
linux_nat_debug_printf, is used throughout the linux-nat.c file, not
just for lwp related debug, so the new name seems to make more sense.
|
|
When running the testsuite, I have:
Running .../gdb/testsuite/gdb.threads/staticthreads.exp ...
DUPLICATE: gdb.threads/staticthreads.exp: couldn't compile staticthreads.c: unrecognized error
Fix by using foreach_with_prefix instead of foreach when preparing the
test case.
Testeed on x86_64-linux both in a setup where the test fails to prepare
and in a setup where the test fails to setup.
|
|
When running test-case gdb.threads/schedlock-thread-exit.exp on a system with
system compiler gcc 4.8.5, I run into:
...
src/gdb/testsuite/gdb.threads/schedlock-thread-exit.c:33:3: error: \
'for' loop initial declarations are only allowed in C99 mode
...
Fix this by:
- using -std=c99, or
- using -std=gnu99, in case that's required, or
- in the case of the jit test-cases, rewriting the for loops.
Tested on x86_64-linux, both with gcc 4.8.5 and gcc 7.5.0.
|
|
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
|
|
While working with pending fork events, I wondered what would happen if
the user detached an inferior while a thread of that inferior had a
pending fork event. What happens with the fork child, which is
ptrace-attached by the GDB process (or by GDBserver), but not known to
the core? Sure enough, neither the core of GDB or the target detach the
child process, so GDB (or GDBserver) just stays ptrace-attached to the
process. The result is that the fork child process is stuck, while you
would expect it to be detached and run.
Make GDBserver detach of fork children it knows about. That is done in
the generic handle_detach function. Since a process_info already exists
for the child, we can simply call detach_inferior on it.
GDB-side, make the linux-nat and remote targets detach of fork children
known because of pending fork events. These pending fork events can be
stored in:
- thread_info::pending_waitstatus, if the core has consumed the event
but then saved it for later (for example, because it got the event
while stopping all threads, to present an all-stop stop on top of a
non-stop target)
- thread_info::pending_follow: if we ran to a "catch fork" and we
detach at that moment
Additionally, pending fork events can be in target-specific fields:
- For linux-nat, they can be in lwp_info::status and
lwp_info::waitstatus.
- For the remote target, they could be stored as pending stop replies,
saved in `remote_state::notif_state::pending_event`, if not
acknowledged yet, or in `remote_state::stop_reply_queue`, if
acknowledged. I followed the model of remove_new_fork_children for
this: call remote_notif_get_pending_events to process /
acknowledge any unacknowledged notification, then look through
stop_reply_queue.
Update the gdb.threads/pending-fork-event.exp test (and rename it to
gdb.threads/pending-fork-event-detach.exp) to try to detach the process
while it is stopped with a pending fork event. In order to verify that
the fork child process is correctly detached and resumes execution
outside of GDB's control, make that process create a file in the test
output directory, and make the test wait $timeout seconds for that file
to appear (it happens instantly if everything goes well).
This test catches a bug in linux-nat.c, also reported as PR 28512
("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig()
const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind ==
TARGET_WAITKIND_SIGNALLED' failed.). When detaching a thread with a
pending event, get_detach_signal unconditionally fetches the signal
stored in the waitstatus (`tp->pending_waitstatus ().sig ()`). However,
that is only valid if the pending event is of type
TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit
would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean
the thread does not exist anymore, so we wouldn't be detaching it). Add
a condition in get_detach_signal to access the signal number only if the
wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0
instead (since the thread was not stopped with a signal to begin with).
Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to
verify that we consider events in pending stop replies in the remote
target. This test has many threads constantly forking, and we detach
from the program while the program is executing. That gives us some
chance that we detach while a fork stop reply is stored in the remote
target. To verify that we correctly detach all fork children, we ask
the parent to exit by sending it a SIGUSR1 signal and have it write a
file to the filesystem before exiting. Because the parent's main thread
joins the forking threads, and the forking threads wait for their fork
children to exit, if some fork child is not detach by GDB, the parent
will not write the file, and the test will time out. If I remove the
new remote_detach_pid calls in remote.c, the test fails eventually if I
run it in a loop.
There is a known limitation: we don't remove breakpoints from the
children before detaching it. So the children, could hit a trap
instruction after being detached and crash. I know this is wrong, and
it should be fixed, but I would like to handle that later. The current
patch doesn't fix everything, but it's a step in the right direction.
Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
|
|
This patch aims at fixing a bug where an inferior is unexpectedly
created when a fork happens at the same time as another event, and that
other event is reported to GDB first (and the fork event stays pending
in GDBserver). This happens for example when we step a thread and
another thread forks at the same time. The bug looks like (if I
reproduce the included test by hand):
(gdb) show detach-on-fork
Whether gdb will detach the child of a fork is on.
(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "parent".
(gdb) si
[New inferior 2]
Reading /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-while-fork-in-other-thread/step-while-fork-in-other-thread from remote target...
Reading /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-while-fork-in-other-thread/step-while-fork-in-other-thread from remote target...
Reading symbols from target:/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-while-fork-in-other-thread/step-while-fork-in-other-thread...
[New Thread 965190.965190]
[Switching to Thread 965190.965190]
Remote 'g' packet reply is too long (expected 560 bytes, got 816 bytes): ... <long series of bytes>
The sequence of events leading to the problem is:
- We are using the all-stop user-visible mode as well as the
synchronous / all-stop variant of the remote protocol
- We have two threads, thread A that we single-step and thread B that
calls fork at the same time
- GDBserver's linux_process_target::wait pulls the "single step
complete SIGTRAP" and the "fork" events from the kernel. It
arbitrarily choses one event to report, it happens to be the
single-step SIGTRAP. The fork stays pending in the thread_info.
- GDBserver send that SIGTRAP as a stop reply to GDB
- While in stop_all_threads, GDB calls update_thread_list, which ends
up querying the remote thread list using qXfer:threads:read.
- In the reply, GDBserver includes the fork child created as a result
of thread B's fork.
- GDB-side, the remote target sees the new PID, calls
remote_notice_new_inferior, which ends up unexpectedly creating a new
inferior, and things go downhill from there.
The problem here is that as long as GDB did not process the fork event,
it should pretend the fork child does not exist. Ultimately, this event
will be reported, we'll go through follow_fork, and that process will be
detached.
The remote target (GDB-side), has some code to remove from the reported
thread list the threads that are the result of forks not processed by
GDB yet. But that only works for fork events that have made their way
to the remote target (GDB-side), but haven't been consumed by the core
yet, so are still lingering as pending stop replies in the remote target
(see remove_new_fork_children in remote.c). But in our case, the fork
event hasn't made its way to the GDB-side remote target. We need to
implement the same kind of logic GDBserver-side: if there exists a
thread / inferior that is the result of a fork event GDBserver hasn't
reported yet, it should exclude that thread / inferior from the reported
thread list.
This was actually discussed a while ago, but not implemented AFAIK:
https://pi.simark.ca/gdb-patches/1ad9f5a8-d00e-9a26-b0c9-3f4066af5142@redhat.com/#t
https://sourceware.org/pipermail/gdb-patches/2016-June/133906.html
Implementation details-wise, the fix for this is all in GDBserver. The
Linux layer of GDBserver already tracks unreported fork parent / child
relationships using the lwp_info::fork_relative, in order to avoid
wildcard actions resuming fork childs unknown to GDB. This information
needs to be made available to the handle_qxfer_threads_worker function,
so it can filter the reported threads. Add a new thread_pending_parent
target function that allows the Linux target to return the parent of an
eventual fork child.
Testing-wise, the test replicates pretty-much the sequence of events
shown above. The setup of the test makes it such that the main thread
is about to fork. We stepi the other thread, so that the step completes
very quickly, in a single event. Meanwhile, the main thread is resumed,
so very likely has time to call fork. This means that the bug may not
reproduce every time (if the main thread does not have time to call
fork), but it will reproduce more often than not. The test fails
without the fix applied on the native-gdbserver and
native-extended-gdbserver boards.
At some point I suspected that which thread called fork and which thread
did the step influenced the order in which the events were reported, and
therefore the reproducibility of the bug. So I made the test try both
combinations: main thread forks while other thread steps, and vice
versa. I'm not sure this is still necessary, but I left it there
anyway. It doesn't hurt to test a few more combinations.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28288
Change-Id: I2158d5732fc7d7ca06b0eb01f88cf27bf527b990
|
|
On OBS I ran into a failure in test-case gdb.threads/thread-specific-bp.exp:
...
(gdb) PASS: gdb.threads/thread-specific-bp.exp: non-stop: continue to end
info breakpoint^M
Num Type Disp Enb Address What^M
1 breakpoint keep y 0x0000555555555167 in main at $src:36^M
breakpoint already hit 1 time^M
2 breakpoint keep y 0x0000555555555151 in start at $src:23^M
breakpoint already hit 1 time^M
3 breakpoint keep y 0x0000555555555167 in main at $src:36 thread 2^M
stop only in thread 2^M
4 breakpoint keep y 0x000055555555515c in end at $src:29^M
breakpoint already hit 1 time^M
(gdb) [Thread 0x7ffff7db1640 (LWP 19984) exited]^M
Thread-specific breakpoint 3 deleted - thread 2 no longer in the thread list.^M
FAIL: gdb.threads/thread-specific-bp.exp: non-stop: \
thread-specific breakpoint was deleted (timeout)
...
Fix this by waiting for the "[Thread 0x7ffff7db1640 (LWP 19984) exited]"
message before issuing the "info breakpoint command".
Tested on x86_64-linux.
|
|
GDB hangs when doing this:
- launch inferior with multiple threads
- multiple threads hit some breakpoint(s)
- one breakpoint hit is presented as a stop, the rest are saved as
pending wait statuses
- "set scheduler-locking on"
- resume the currently selected thread (because of scheduler-locking,
it's the only one resumed), let it execute until exit
- GDB hangs, not showing the prompt, impossible to interrupt with ^C
When the resumed thread exits, we expect the target to return a
TARGET_WAITKIND_NO_RESUMED event, and that's what we see:
[infrun] fetch_inferior_event: enter
[infrun] scoped_disable_commit_resumed: reason=handling event
[infrun] random_pending_event_thread: None found.
[Thread 0x7ffff7d9c700 (LWP 309357) exited]
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: -1.0.0 [process -1],
[infrun] print_target_wait_results: status->kind = no-resumed
[infrun] handle_inferior_event: status->kind = no-resumed
[infrun] handle_no_resumed: TARGET_WAITKIND_NO_RESUMED (ignoring: found resumed)
[infrun] prepare_to_wait: prepare_to_wait
[infrun] reset: reason=handling event
[infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
[infrun] fetch_inferior_event: exit
The problem is in handle_no_resumed: we check if some other thread is
actually resumed, to see if we should ignore that event (see comments in
that function for more info). If this condition is true:
(thread->executing () || thread->has_pending_waitstatus ())
... then we ignore the event. The problem is that there are some non-resumed
threads with a pending event, which makes us ignore the event. But these
threads are not resumed, so we end up waiting while nothing executes, hence
waiting for ever.
My first fix was to change the condition to:
(thread->executing ()
|| (thread->resumed () && thread->has_pending_waitstatus ()))
... but then it occured to me that we could simply check for:
(thread->resumed ())
Since "executing" implies "resumed", checking simply for "resumed"
covers threads that are resumed and executing, as well as threads that
are resumed with a pending status, which is what we want.
Change-Id: Ie796290f8ae7f34c026ca3a8fcef7397414f4780
|
|
PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent
failure) shows that GDB can hit an unexpected scenario -- it can
happen that the kernel manages to open a /proc/PID/task/LWP/mem file,
but then reading from the file returns 0/EOF, even though the process
hasn't exited or execed.
"0" out of read/write is normally what you get when the address space
of the process the file was open for is gone, because the process
execed or exited. So when GDB gets the 0, it returns memory access
failure. In the bad case in question, the process hasn't execed or
exited, so GDB fails a memory access when the access should have
worked.
GDB has code in place to gracefully handle the case of opening the
/proc/PID/task/LWP/mem just while the LWP is exiting -- most often the
open fails with EACCES or ENOENT. When it happens, GDB just tries
opening the file for a different thread of the process. The testcase
is written such that it stresses GDB's logic of closing/reopening the
/proc/PID/task/LWP/mem file, by constantly spawning short lived
threads.
However, there's a window where the kernel manages to find the thread,
but the thread exits just after and clears its address space pointer.
In this case, the kernel creates a file successfully, but the file
ends up with no address space associated, so a subsequent read/write
returns 0/EOF too, just like if the whole process had execed or
exited. This is the case in question that GDB does not handle.
Oleg Nesterov gave this suggestion as workaround for that race:
gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm.
If statm reports something non-zero, then open() was "successfull".
I think that might work. However, I didn't try it, because I realized
we have another nasty race that that wouldn't fix.
The other race I realized is that because we close/reopen the
/proc/PID/task/LWP/mem file when GDB switches to a different inferior,
then it can happen that GDB reopens /proc/PID/task/LWP/mem just after
a thread execs, and before GDB has seen the corresponding exec event.
I.e., we can open a /proc/PID/task/LWP/mem file accessing the
post-exec address space thinking we're accessing the pre-exec address
space.
A few months back, Simon, Oleg and I discussed a similar race:
[Bug gdb/26754] Race condition when resuming threads and one does an exec
https://sourceware.org/bugzilla/show_bug.cgi?id=26754
The solution back then was to make the kernel fail any ptrace
operation until the exec event is consumed, with this kernel commit:
commit dbb5afad100a828c97e012c6106566d99f041db6
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed May 12 15:33:08 2021 +0200
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed May 12 10:45:22 2021 -0700
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
This however, only applies to ptrace, not to the /proc/pid/mem file
opening case. Also, even if it did apply to the file open case, we
would want to support current kernels until such a fix is more wide
spread anyhow.
So all in all, this commit gives up on the idea of only ever keeping
one /proc/pid/mem file descriptor open. Instead, make GDB open a
/proc/pid/mem per inferior, and keep it open until the inferior exits,
is detached or execs. Make GDB open the file right after the inferior
is created or is attached to or forks, at which point we know the
inferior is stable and stopped and isn't thus going to exec, or have a
thread exit, and so the file open won't fail (unless the whole process
is SIGKILLed from outside GDB, at which point it doesn't matter
whether we open the file).
This way, we avoid both races described above, at the expense of using
more file descriptors (one per inferior).
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419
|
|
On openSUSE Tumbleweed with glibc-debuginfo installed I get:
...
(gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
where^M
#0 print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
#1 0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
#2 0x00007ffff7d56b37 in start_thread (arg=<optimized out>) \
at pthread_create.c:435^M
#3 0x00007ffff7ddb640 in clone3 () \
at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81^M
(gdb) PASS: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...
while without debuginfo installed I get instead:
...
(gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
where^M
#0 print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
#1 0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
#2 0x00007ffff7d56b37 in start_thread () from /lib64/libc.so.6^M
#3 0x00007ffff7ddb640 in clone3 () from /lib64/libc.so.6^M
(gdb) FAIL: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...
The problem is that the regexp used:
...
"\(from .*libpthread\|at pthread_create\|in pthread_create\)"
...
expects the 'from' part to match libpthread, but in glibc 2.34 libpthread has
been merged into libc.
Fix this by updating the regexp.
Tested on x86_64-linux.
|
|
When running test-case gdb.threads/check-libthread-db.exp on openSUSE
Tumbleweed (with glibc 2.34) I get:
...
(gdb) continue^M
Continuing.^M
[Thread debugging using libthread_db enabled]^M
Using host libthread_db library "/lib64/libthread_db.so.1".^M
Stopped due to shared library event:^M
Inferior loaded /lib64/libm.so.6^M
/lib64/libc.so.6^M
(gdb) FAIL: gdb.threads/check-libthread-db.exp: user-initiated check: continue
...
The check expect the inferior to load libpthread, but since glibc 2.34
libpthread has been integrated into glibc, and consequently it's no longer
a dependency:
...
$ ldd outputs/gdb.threads/check-libthread-db/check-libthread-db
linux-vdso.so.1 (0x00007ffe4cae4000)
libm.so.6 => /lib64/libm.so.6 (0x00007f167c77c000)
libc.so.6 => /lib64/libc.so.6 (0x00007f167c572000)
/lib64/ld-linux-x86-64.so.2 (0x00007f167c86e000)
...
Fix this by updating the regexp to expect libpthread or libc.
Tested on x86_64-linux.
|
|
As follow-up to this discussion:
https://sourceware.org/pipermail/gdb-patches/2020-August/171385.html
... make runto_main not pass no-message to runto. This means that if we
fail to run to main, for some reason, we'll emit a FAIL. This is the
behavior we want the majority of (if not all) the time.
Without this, we rely on tests logging a failure if runto_main fails,
otherwise. They do so in a very inconsisteny mannet, sometimes using
"fail", "unsupported" or "untested". The messages also vary widly.
This patch removes all these messages as well.
Also, remove a few "fail" where we call runto (and not runto_main). by
default (without an explicit no-message argument), runto prints a
failure already. In two places, gdb.multi/multi-re-run.exp and
gdb.python/py-pp-registration.exp, remove "message" passed to runto.
This removes a few PASSes that we don't care about (but FAILs will still
be printed if we fail to run to where we want to). This aligns their
behavior with the rest of the testsuite.
Change-Id: Ib763c98c5f4fb6898886b635210d7c34bd4b9023
|
|
gdb.threads/process-dies-while-detaching.exp
The test-case gdb.threads/process-dies-while-detaching.exp takes about 20s
when using hw watchpoints, but when forcing sw watchpoints (using the patch
mentioned in PR28375#c0), the test-case takes instead 3m14s.
Also, it show a FAIL:
...
(gdb) continue^M
Continuing.^M
Cannot find user-level thread for LWP 10324: generic error^M
(gdb) FAIL: gdb.threads/process-dies-while-detaching.exp: single-process:
continue: watchpoint: continue
...
for which PR28375 was filed.
Modify the test-case to:
- add the hw/sw axis to the watchpoint testing, to ensure that we
observe the sw watchpoint behaviour also on can-use-hw-watchpoints
architectures.
- skip the hw breakpoint testing if not supported
- set the sw watchpoint later to avoid making the test
too slow. This still triggers the same PR, but now takes just 24s.
This patch adds a KFAIL for PR28375.
Tested on x86_64-linux.
|
|
Minimize gdb restarts, applying the following rules:
- don't use prepare_for_testing unless necessary
- don't use clean_restart unless necessary
Also, if possible, replace build_for_executable + clean_restart
with prepare_for_testing for brevity.
Touches 68 test-cases.
Tested on x86_64-linux.
|
|
Replace {additional_flags=-fPIE ldflags=-pie} with {pie}.
This makes sure that the test-cases properly error out when using target board
unix/-fno-PIE/-no-pie.
Tested on x86_64-linux.
|
|
When running test-case gdb.threads/continue-pending-status.exp with native, I
have:
...
(gdb) continue^M
Continuing.^M
PASS: gdb.threads/continue-pending-status.exp: attempt 0: continue for ctrl-c
^C^M
Thread 1 "continue-pendin" received signal SIGINT, Interrupt.^M
[Switching to Thread 0x7ffff7fc4740 (LWP 1276)]^M
0x00007ffff758e4c0 in __GI___nanosleep () at nanosleep.c:27^M
27 return SYSCALL_CANCEL (nanosleep, requested_time, remaining);^M
(gdb) PASS: gdb.threads/continue-pending-status.exp: attempt 0: caught interrupt
...
but with target board unix/-m32, I run into:
...
(gdb) continue^M
Continuing.^M
PASS: gdb.threads/continue-pending-status.exp: attempt 0: continue for ctrl-c
[Thread 0xf74aeb40 (LWP 31957) exited]^M
[Thread 0xf7cafb40 (LWP 31956) exited]^M
[Inferior 1 (process 31952) exited normally]^M
(gdb) Quit^M
...
The problem is that the sleep (300) call at the end of main is interrupted,
which causes the inferior to exit before the ctrl-c can be send.
This problem is described at "Interrupted System Calls" in the docs, and the
suggested solution (using a sleep loop) indeed fixes the problem.
Fix this instead using the more prevalent:
...
alarm (300);
...
while (1) sleep (1);
...
which is roughly equivalent because the sleep is called at the end of main,
but slightly better because it guards against hangs from the start rather than
from the end of main.
Likewise in gdb.base/watch_thread_num.exp.
Likewise in gdb.btrace/enable-running.exp, but use the sleep loop there,
because the sleep is not called at the end of main.
Tested on x86_64-linux.
|
|
When running test-case gdb.threads/check-libthread-db.exp on openSUSE
Tumbleweed with glibc 2.33, I get:
...
(gdb) maint check libthread-db^M
Running libthread_db integrity checks:^M
Got thread 0x7ffff7c79b80 => 9354 => 0x7ffff7c79b80; errno = 0 ... OK^M
libthread_db integrity checks passed.^M
(gdb) FAIL: gdb.threads/check-libthread-db.exp: user-initiated check: \
libpthread.so not initialized (pattern 2)
...
The test-case expects instead:
...
Got thread 0x0 => 9354 => 0x0 ... OK^M
...
which is what I get on openSUSE Leap 15.2 with glibc 2.26, and what is
described in the test-case like this:
...
# libthread_db should fake a single thread with th_unique == NULL.
...
Using a breakpoint on check_thread_db_callback we can compare the two
scenarios, and find that in the latter case we hit this code in glibc function
iterate_thread_list in nptl_db/td_ta_thr_iter.c:
...
if (next == 0 && fake_empty)
{
/* __pthread_initialize_minimal has not run. There is just the main
thread to return. We cannot rely on its thread register. They
sometimes contain garbage that would confuse us, left by the
kernel at exec. So if it looks like initialization is incomplete,
we only fake a special descriptor for the initial thread. */
td_thrhandle_t th = { ta, 0 };
return callback (&th, cbdata_p) != 0 ? TD_DBERR : TD_OK;
}
...
while in the former case we don't because this preceding statement doesn't
result in next == 0:
...
err = DB_GET_FIELD (next, ta, head, list_t, next, 0);
...
Note that the comment mentions __pthread_initialize_minimal, but in both cases
it has already run before we hit the callback, so it's possible the comment is
no longer accurate.
The change in behaviour bisect to glibc commit 1daccf403b "nptl: Move stack
list variables into _rtld_global", which moves the initialization of stack
list variables such as __stack_user to an earlier moment, which explains well
enough the observed difference.
Fix this by updating the regexp patterns to agree with what libthread-db is
telling us.
Tested on x86_64-linux, both with glibc 2.33 and 2.26.
gdb/testsuite/ChangeLog:
2021-07-07 Tom de Vries <tdevries@suse.de>
PR testsuite/27690
* gdb.threads/check-libthread-db.exp: Update patterns for glibc 2.33.
|
|
Currently, on GNU/Linux, if you try to access memory and you have a
running thread selected, GDB fails the memory accesses, like:
(gdb) c&
Continuing.
(gdb) p global_var
Cannot access memory at address 0x555555558010
Or:
(gdb) b main
Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.
Warning:
Cannot insert breakpoint 2.
Cannot access memory at address 0x55555555524d
This patch removes this limitation. It teaches the native Linux
target to read/write memory even if the target is running. And it
does this without temporarily stopping threads. We now get:
(gdb) c&
Continuing.
(gdb) p global_var
$1 = 123
(gdb) b main
Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.
(The scenarios above work correctly with current GDBserver, because
GDBserver temporarily stops all threads in the process whenever GDB
wants to access memory (see prepare_to_access_memory /
done_accessing_memory). Freezing the whole process makes sense when
we need to be sure that we have a consistent view of memory and don't
race with the inferior changing it at the same time as GDB is
accessing it. But I think that's a too-heavy hammer for the default
behavior. I think that ideally, whether to stop all threads or not
should be policy decided by gdb core, probably best implemented by
exposing something like gdbserver's prepare_to_access_memory /
done_accessing_memory to gdb core.)
Currently, if we're accessing (reading/writing) just a few bytes, then
the Linux native backend does not try accessing memory via
/proc/<pid>/mem and goes straight to ptrace
PTRACE_PEEKTEXT/PTRACE_POKETEXT. However, ptrace always fails when
the ptracee is running. So the first step is to prefer
/proc/<pid>/mem even for small accesses. Without further changes
however, that may cause a performance regression, due to constantly
opening and closing /proc/<pid>/mem for each memory access. So the
next step is to keep the /proc/<pid>/mem file open across memory
accesses. If we have this, then it doesn't make sense anymore to even
have the ptrace fallback, so the patch disables it.
I've made it such that GDB only ever has one /proc/<pid>/mem file open
at any time. As long as a memory access hits the same inferior
process as the previous access, then we reuse the previously open
file. If however, we access memory of a different process, then we
close the previous file and open a new one for the new process.
If we wanted, we could keep one /proc/<pid>/mem file open per
inferior, and never close them (unless the inferior exits or execs).
However, having seen bfd patches recently about hitting too many open
file descriptors, I kept the logic to have only one file open tops.
Also, we need to handle memory accesses for processes for which we
don't have an inferior object, for when we need to detach a
fork-child, and we'd probaly want to handle caching the open file for
that scenario (no inferior for process) too, which would probably end
up meaning caching for last non-inferior process, which is very much
what I'm proposing anyhow. So always having one file open likely ends
up a smaller patch.
The next step is handling the case of GDB reading/writing memory
through a thread that is running and exits. The access should not
result in a user-visible failure if the inferior/process is still
alive.
Once we manage to open a /proc/<lwpid>/mem file, then that file is
usable for memory accesses even if the corresponding lwp exits and is
reaped. I double checked that trying to open the same
/proc/<lwpid>/mem path again fails because the lwp is really gone so
there's no /proc/<lwpid>/ entry on the filesystem anymore, but the
previously open file remains usable. It's only when the whole process
execs that we need to reopen a new file.
When the kernel destroys the whole address space, i.e., when the
process exits or execs, the reads/writes fail with 0 aka EOF, in which
case there's nothing else to do than returning a memory access
failure. Note this means that when we get an exec event, we need to
reopen the file, to access the process's new address space.
If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening it for exits before we open it and before we reap the
LWP (i.e., the LWP is zombie), the open fails with EACCES. The patch
handles this by just looking for another thread until it finds one
that we can open a /proc/<pid>/mem successfully for.
If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening has exited and we already reaped it, which is the case
if the selected thread is in THREAD_EXIT state, the open fails with
ENOENT. The patch handles this the same way as a zombie race
(EACCES), instead of checking upfront whether we're accessing a
known-exited thread, because that would result in more complicated
code, because we also need to handle accessing lwps that are not
listed in the core thread list, and it's the core thread list that
records the THREAD_EXIT state.
The patch includes two testcases:
#1 - gdb.base/access-mem-running.exp
This is the conceptually simplest - it is single-threaded, and has
GDB read and write memory while the program is running. It also
tests setting a breakpoint while the program is running, and checks
that the breakpoint is hit immediately.
#2 - gdb.threads/access-mem-running-thread-exit.exp
This one is more elaborate, as it continuously spawns short-lived
threads in order to exercise accessing memory just while threads are
exiting. It also spawns two different processes and alternates
accessing memory between the two processes to exercise the reopening
the /proc file frequently. This also ends up exercising GDB reading
from an exited thread frequently. I confirmed by putting abort()
calls in the EACCES/ENOENT paths added by the patch that we do hit
all of them frequently with the testcase. It also exits the
process's main thread (i.e., the main thread becomes zombie), to
make sure accessing memory in such a corner-case scenario works now
and in the future.
The tests fail on GNU/Linux native before the code changes, and pass
after. They pass against current GDBserver, again because GDBserver
supports memory access even if all threads are running, by
transparently pausing the whole process.
gdb/ChangeLog:
yyyy-mm-dd Pedro Alves <pedro@palves.net>
PR mi/15729
PR gdb/13463
* linux-nat.c (linux_nat_target::detach): Close the
/proc/<pid>/mem file if it was open for this process.
(linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
/proc/<pid>/mem file if it was open for this process.
(linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
if it was open for this process.
(linux_nat_target::xfer_partial): Adjust. Do not fall back to
inf_ptrace_target::xfer_partial for memory accesses.
(last_proc_mem_file): New.
(maybe_close_proc_mem_file): New.
(linux_proc_xfer_memory_partial_pid): New, with bits factored out
from linux_proc_xfer_partial.
(linux_proc_xfer_partial): Delete.
(linux_proc_xfer_memory_partial): New.
gdb/testsuite/ChangeLog
yyyy-mm-dd Pedro Alves <pedro@palves.net>
PR mi/15729
PR gdb/13463
* gdb.base/access-mem-running.c: New.
* gdb.base/access-mem-running.exp: New.
* gdb.threads/access-mem-running-thread-exit.c: New.
* gdb.threads/access-mem-running-thread-exit.exp: New.
Change-Id: Ib3c082528872662a3fc0ca9b31c34d4876c874c9
|
|
With a testsuite setup modified to make expect wait a little bit longer for
gdb output (see PR27957), I reliably run into:
...
PASS: gdb.threads/multi-create-ns-info-thr.exp: continue to breakpoint 1
FAIL: gdb.threads/multi-create-ns-info-thr.exp: continue to breakpoint 2 \
(timeout)
...
This is due to this regexp:
...
-re "Breakpoint $decimal,.*$srcfile:$bp_location1" {
...
consuming several lines using the ".*" part, while it's intended to match one
line looking like this:
...
Thread 1 "multi-create-ns" hit Breakpoint 2, create_function () \
at multi-create.c:45^M
...
Fix this by limiting the regexp to one line.
Tested on x86_64-linux.
gdb/testsuite/ChangeLog:
2021-06-08 Tom de Vries <tdevries@suse.de>
* gdb.threads/multi-create-ns-info-thr.exp: Limit breakpoint regexp to
one line.
|
|
The current test case leaves detached processes running at the end of
the test. This patch changes the test to use a barrier wait to ensure all
processes exit cleanly at the end of the tests.
gdb/testsuite/ChangeLog:
2021-06-02 Carl Love <cel@us.ibm.com>
* gdb.threads/threadapply.c: Add global mybarrier.
(main): Add pthread_barrier_init.
(thread_function): Replace while loop with myp increment and
pthread_barrier_wait.
|
|
When running test-case gdb.threads/detach-step-over.exp with target board
readnow, I run into:
...
Reading symbols from /lib64/libc.so.6...^M
Reading symbols from \
/usr/lib/debug/lib64/libc-2.26.so-2.26-lp152.26.6.1.x86_64.debug...^M
Expanding full symbols from \
/usr/lib/debug/lib64/libc-2.26.so-2.26-lp152.26.6.1.x86_64.debug...^M
FAIL: gdb.threads/detach-step-over.exp: \
breakpoint-condition-evaluation=host: target-non-stop=on: non-stop=on: \
displaced=off: iter 2: attach (timeout)
...
Fix this by doing exp_continue when encountering the "Reading symbols" or
"Expanding full symbols" lines.
This is still fragile and times out with a higher load, similated f.i. by
stress -c 5. Fix that by using a timeout factor of 2.
Tested on x86_64-linux.
gdb/testsuite/ChangeLog:
2021-05-05 Tom de Vries <tdevries@suse.de>
* gdb.threads/detach-step-over.exp: Do exp_continue when encountering
"Reading symbols" or "Expanding full symbols" lines. Using timeout
factor of 2 for attach.
|
|
When running test-case gdb.threads/fork-plus-threads.exp with target board
readnow, I run into:
...
[LWP 9362 exited]^M
[New LWP 9365]^M
[New LWP 9363]^M
[New LWP 9364]^M
FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: \
inferior 1 exited (timeout)
...
There is code in the test-case to prevent timeouts with readnow:
...
-re "Thread \[^\r\n\]+ exited" {
# Avoid timeout with check-read1
exp_continue
}
-re "New Thread \[^\r\n\]+" {
# Avoid timeout with check-read1
exp_continue
}
...
but this doesn't trigger because we get LWP rather than Thread.
Fix this by making these regexps accept LWP as well.
Tested on x86_64-linux.
gdb/testsuite/ChangeLog:
2021-05-05 Tom de Vries <tdevries@suse.de>
* gdb.threads/fork-plus-threads.exp: Handle "New LWP <n>" and
"LWP <n> exited" messages.
|
|
I noticed that using foreach_with_prefix could make things a bit
less verbose. No changes in behavior expected.
gdb/testsuite/ChangeLog:
* gdb.threads/fork-plus-threads.exp: Use foreach_with_prefix.
Change-Id: I06aa6e3d10a9cfb6ada11547aefe8c70b636ac81
|
|
When running test-case gdb.threads/gcore-thread.exp on openSUSE Tumbleweed,
I run into these XFAILs:
...
XFAIL: gdb.threads/gcore-thread.exp: clear __stack_user.next
XFAIL: gdb.threads/gcore-thread.exp: clear stack_used.next
...
Apart from the xfail, the test-case also sets core0file to "":
...
-re "No symbol \"${symbol}\" in current context\\.\r\n$gdb_prompt $" {
xfail $test
# Do not do the verification.
set core0file ""
}
...
After which we run into this FAIL, because gdb_core_cmd fails to load a
core file called "":
...
(gdb) core ^M
No core file now.^M
(gdb) FAIL: gdb.threads/gcore-thread.exp: core0file: \
re-load generated corefile
...
Fix this FAIL by skipping gdb_core_cmd if the core file is "".
Tested on x86_64-linux.
gdb/testsuite/ChangeLog:
2021-04-06 Tom de Vries <tdevries@suse.de>
PR testsuite/27691
* gdb.threads/gcore-thread.exp: Don't call gdb_core_cmd with core
file "".
|
|
Resolve all of the duplicate test names in the gdb.threads/*.exp set
of tests (that I see). Nothing very exciting here, mostly either
giving tests explicit testnames, or adding with_test_prefix.
The only interesting one is gdb.threads/execl.exp, I believe the
duplicate test name was caused by an actual duplicate test. I've
remove the simpler form of the test. I don't believe we've lost any
test coverage with this change.
gdb/testsuite/ChangeLog:
* gdb.threads/execl.exp: Remove duplicate 'info threads' test.
Make use of $gdb_test_name instead of creating a separate $test
variable.
* gdb.threads/print-threads.exp: Add a with_test_prefix instead of
adding a '($name)' at the end of each test. This also catches the
one place where '($name)' was missing, and so caused a duplicate
test name.
* gdb.threads/queue-signal.exp: Give tests unique names to avoid
duplicate test names based on the command being tested.
* gdb.threads/signal-command-multiple-signals-pending.exp:
Likewise.
* lib/gdb.exp (gdb_compile_shlib_pthreads): Tweak test name to
avoid duplicate testnames when a test script uses this proc and
also gdb_compile_pthreads.
* lib/prelink-support.exp (build_executable_own_libs): Use
with_test_prefix to avoid duplicate test names when we call
build_executable twice.
|
|
While running tests on arm-none-eabi, I noticed following errors in some
gdb.threads tests.
ERROR: can't read "testfile": no such variable
These were being caused by ${testfile} being used before 'standard_testfile'
which sets it. This patch just moves standard_testfile before the use.
2021-02-09 Abid Qadeer <abidh@codesourcery.com>
gdb/testsuite/ChangeLog:
* gdb.threads/signal-command-handle-nopass.exp: Call
'standard_testfile' before using 'testfile'.
* gdb.threads/signal-command-multiple-signals-pending.exp: Likewise.
* gdb.threads/signal-delivered-right-thread.exp: Likewise
* gdb.threads/signal-sigtrap.exp: Likewise
|
|
This adds a testcase that exercises detaching while GDB is stepping
over a breakpoint, in all combinations of:
- maint target non-stop off/on
- set non-stop on/off
- displaced stepping on/off
This exercises the bugs fixed in the previous 8 patches.
gdb/testsuite/ChangeLog:
* gdb.threads/detach-step-over.c: New file.
* gdb.threads/detach-step-over.exp: New file.
|
|
This adds a testcase exercising attaching to a multi-threaded process,
in all combinations of:
- set non-stop on/off
- maint target non-stop off/on
- "attach" vs "attach &"
This exercises the bugs fixed in the two previous patches.
gdb/testsuite/ChangeLog:
* gdb.threads/attach-non-stop.c: New file.
* gdb.threads/attach-non-stop.exp: New file.
|
|
When running test-case gdb.threads/killed-outside.exp with target board
unix/-m32, we run into:
...
(gdb) PASS: gdb.threads/killed-outside.exp: get pid of inferior
Executing on target: kill -9 10969 (timeout = 300)
spawn -ignore SIGHUP kill -9 10969^M
continue^M
Continuing.^M
[Thread 0xf7cb4b40 (LWP 10973) exited]^M
^M
Program terminated with signal SIGKILL, Killed.^M
The program no longer exists.^M
(gdb) FAIL: gdb.threads/killed-outside.exp: prompt after first continue
...
Fix this by allowing this output.
Tested on x86_64-linux.
gdb/testsuite/ChangeLog:
2021-01-26 Tom de Vries <tdevries@suse.de>
* gdb.threads/killed-outside.exp: Allow regular output.
|
|
gdb.threads/signal-while-stepping-over-bp-other-thread.exp
Commit 3ec3145c5dd6 ("gdb: introduce scoped debug prints") updated some
tests using "set debug infrun" to handle the fact that a debug print is
now shown after the prompt, after an inferior stop. The same issue
happens in gdb.threads/signal-while-stepping-over-bp-other-thread.exp.
If I run it in a loop, it eventually fails like these other tests.
The problem is that the testsuite expects to see $gdb_prompt followed by
the end of the buffer. It happens that expect reads $gdb_prompt and the
debug print at the same time, in which case the regexp never matches and
we get a timeout.
The fix is the same as was done in 3ec3145c5dd6, make the testsuite
believe that the prompt is the standard GDB prompt followed by that
debug print.
Since that test uses gdb_test_sequence, and the expected prompt is in
gdb_test_sequence, add a -prompt switch to gdb_test_sequence to override
the prompt used for that call.
gdb/testsuite/ChangeLog:
* lib/gdb.exp (gdb_test_sequence): Accept -prompt switch.
* gdb.threads/signal-while-stepping-over-bp-other-thread.exp:
Pass prompt containing debug print to gdb_test_sequence.
Change-Id: I33161c53ddab45cdfeadfd50b964f8dc3caa9729
|
|
I spent a lot of time reading infrun debug logs recently, and I think
they could be made much more readable by being indented, to clearly see
what operation is done as part of what other operation. In the current
format, there are no visual cues to tell where things start and end,
it's just a big flat list. It's also difficult to understand what
caused a given operation (e.g. a call to resume_1) to be done.
To help with this, I propose to add the new scoped_debug_start_end
structure, along with a bunch of macros to make it convenient to use.
The idea of scoped_debug_start_end is simply to print a start and end
message at construction and destruction. It also increments/decrements
a depth counter in order to make debug statements printed during this
range use some indentation. Some care is taken to handle the fact that
debug can be turned on or off in the middle of such a range. For
example, a "set debug foo 1" command in a breakpoint command, or a
superior GDB manually changing the debug_foo variable.
Two macros are added in gdbsupport/common-debug.h, which are helpers to
define module-specific macros:
- scoped_debug_start_end: takes a message that is printed both at
construction / destruction, with "start: " and "end: " prefixes.
- scoped_debug_enter_exit: prints hard-coded "enter" and "exit"
messages, to denote the entry and exit of a function.
I added some examples in the infrun module to give an idea of how it can
be used and what the result looks like. The macros are in capital
letters (INFRUN_SCOPED_DEBUG_START_END and
INFRUN_SCOPED_DEBUG_ENTER_EXIT) to mimic the existing SCOPE_EXIT, but
that can be changed if you prefer something else.
Here's an excerpt of the debug
statements printed when doing "continue", where a displaced step is
started:
[infrun] proceed: enter
[infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
[infrun] global_thread_step_over_chain_enqueue: enqueueing thread Thread 0x7ffff75a5640 (LWP 2289301) in global step over chain
[infrun] start_step_over: enter
[infrun] start_step_over: stealing global queue of threads to step, length = 1
[infrun] start_step_over: resuming [Thread 0x7ffff75a5640 (LWP 2289301)] for step-over
[infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=1, current thread [Thread 0x7ffff75a5640 (LWP 2289301)] at 0x5555555551bd
[displaced] displaced_step_prepare_throw: displaced-stepping Thread 0x7ffff75a5640 (LWP 2289301) now
[displaced] prepare: selected buffer at 0x5555555550c2
[displaced] prepare: saved 0x5555555550c2: 1e fa 31 ed 49 89 d1 5e 48 89 e2 48 83 e4 f0 50
[displaced] amd64_displaced_step_copy_insn: copy 0x5555555551bd->0x5555555550c2: c7 45 fc 00 00 00 00 eb 13 8b 05 d4 2e 00 00 83
[displaced] displaced_step_prepare_throw: prepared successfully thread=Thread 0x7ffff75a5640 (LWP 2289301), original_pc=0x5555555551bd, displaced_pc=0x5555555550c2
[displaced] resume_1: run 0x5555555550c2: c7 45 fc 00
[infrun] infrun_async: enable=1
[infrun] prepare_to_wait: prepare_to_wait
[infrun] start_step_over: [Thread 0x7ffff75a5640 (LWP 2289301)] was resumed.
[infrun] operator(): step-over queue now empty
[infrun] start_step_over: exit
[infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
[infrun] proceed: resuming Thread 0x7ffff7da7740 (LWP 2289296)
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [Thread 0x7ffff7da7740 (LWP 2289296)] at 0x7ffff7f7d9b7
[infrun] prepare_to_wait: prepare_to_wait
[infrun] proceed: resuming Thread 0x7ffff7da6640 (LWP 2289300)
[infrun] resume_1: thread Thread 0x7ffff7da6640 (LWP 2289300) has pending wait status status->kind = stopped, signal = GDB_SIGNAL_TRAP (currently_stepping=0).
[infrun] prepare_to_wait: prepare_to_wait
[infrun] proceed: [Thread 0x7ffff75a5640 (LWP 2289301)] resumed
[infrun] proceed: resuming Thread 0x7ffff6da4640 (LWP 2289302)
[infrun] resume_1: thread Thread 0x7ffff6da4640 (LWP 2289302) has pending wait status status->kind = stopped, signal = GDB_SIGNAL_TRAP (currently_stepping=0).
[infrun] prepare_to_wait: prepare_to_wait
[infrun] proceed: end: resuming threads, all-stop-on-top-of-non-stop
[infrun] proceed: exit
We can easily see where the call to `proceed` starts and end. We can
also see why there are a bunch of resume_1 calls, it's because we are
resuming threads, emulating all-stop on top of a non-stop target.
We also see that debug statements nest well with other modules that have
been migrated to use the "new" debug statement helpers (because they all
use debug_prefixed_vprintf in the end. I think this is desirable, for
example we could see the debug statements about reading the DWARF info
of a library nested under the debug statements about loading that
library.
Of course, modules that haven't been migrated to use the "new" helpers
will still print without indentations. This will be one good reason to
migrate them.
I think the runtime cost (when debug statements are disabled) of this is
reasonable, given the improvement in readability. There is the cost of
the conditionals (like standard debug statements), one more condition
(if (m_must_decrement_print_depth)) and the cost of constructing a stack
object, which means copying a fews pointers.
Adding the print in fetch_inferior_event breaks some tests that use "set
debug infrun", because it prints a debug statement after the prompt. I
adapted these tests to cope with it, by using the "-prompt" switch of
gdb_test_multiple to as if this debug statement is part of the expected
prompt. It's unfortunate that we have to do this, but I think the debug
print is useful, and I don't want a few tests to get in the way of
adding good debug output.
gdbsupport/ChangeLog:
* common-debug.h (debug_print_depth): New.
(struct scoped_debug_start_end): New.
(scoped_debug_start_end): New.
(scoped_debug_enter_exit): New.
* common-debug.cc (debug_prefixed_vprintf): Print indentation.
gdb/ChangeLog:
* debug.c (debug_print_depth): New.
* infrun.h (INFRUN_SCOPED_DEBUG_START_END): New.
(INFRUN_SCOPED_DEBUG_ENTER_EXIT): New.
* infrun.c (start_step_over): Use
INFRUN_SCOPED_DEBUG_ENTER_EXIT.
(proceed): Use INFRUN_SCOPED_DEBUG_ENTER_EXIT and
INFRUN_SCOPED_DEBUG_START_END.
(fetch_inferior_event): Use INFRUN_SCOPED_DEBUG_ENTER_EXIT.
gdbserver/ChangeLog:
* debug.cc (debug_print_depth): New.
gdb/testsuite/ChangeLog:
* gdb.base/ui-redirect.exp: Expect infrun debug print after
prompt.
* gdb.threads/ia64-sigill.exp: Likewise.
* gdb.threads/watchthreads-reorder.exp: Likewise.
Change-Id: I7c3805e6487807aa63a1bae318876a0c69dce949
|
|
This commits the result of running gdb/copyright.py as per our Start
of New Year procedure...
gdb/ChangeLog
Update copyright year range in copyright header of all GDB files.
|
|
displaced steps
Today, GDB only allows a single displaced stepping operation to happen
per inferior at a time. There is a single displaced stepping buffer per
inferior, whose address is fixed (obtained with
gdbarch_displaced_step_location), managed by infrun.c.
In the case of the AMD ROCm target [1] (in the context of which this
work has been done), it is typical to have thousands of threads (or
waves, in SMT terminology) executing the same code, hitting the same
breakpoint (possibly conditional) and needing to to displaced step it at
the same time. The limitation of only one displaced step executing at a
any given time becomes a real bottleneck.
To fix this bottleneck, we want to make it possible for threads of a
same inferior to execute multiple displaced steps in parallel. This
patch builds the foundation for that.
In essence, this patch moves the task of preparing a displaced step and
cleaning up after to gdbarch functions. This allows using different
schemes for allocating and managing displaced stepping buffers for
different platforms. The gdbarch decides how to assign a buffer to a
thread that needs to execute a displaced step.
On the ROCm target, we are able to allocate one displaced stepping
buffer per thread, so a thread will never have to wait to execute a
displaced step.
On Linux, the entry point of the executable if used as the displaced
stepping buffer, since we assume that this code won't get used after
startup. From what I saw (I checked with a binary generated against
glibc and musl), on AMD64 we have enough space there to fit two
displaced stepping buffers. A subsequent patch makes AMD64/Linux use
two buffers.
In addition to having multiple displaced stepping buffers, there is also
the idea of sharing displaced stepping buffers between threads. Two
threads doing displaced steps for the same PC could use the same buffer
at the same time. Two threads stepping over the same instruction (same
opcode) at two different PCs may also be able to share a displaced
stepping buffer. This is an idea for future patches, but the
architecture built by this patch is made to allow this.
Now, the implementation details. The main part of this patch is moving
the responsibility of preparing and finishing a displaced step to the
gdbarch. Before this patch, preparing a displaced step is driven by the
displaced_step_prepare_throw function. It does some calls to the
gdbarch to do some low-level operations, but the high-level logic is
there. The steps are roughly:
- Ask the gdbarch for the displaced step buffer location
- Save the existing bytes in the displaced step buffer
- Ask the gdbarch to copy the instruction into the displaced step buffer
- Set the pc of the thread to the beginning of the displaced step buffer
Similarly, the "fixup" phase, executed after the instruction was
successfully single-stepped, is driven by the infrun code (function
displaced_step_finish). The steps are roughly:
- Restore the original bytes in the displaced stepping buffer
- Ask the gdbarch to fixup the instruction result (adjust the target's
registers or memory to do as if the instruction had been executed in
its original location)
The displaced_step_inferior_state::step_thread field indicates which
thread (if any) is currently using the displaced stepping buffer, so it
is used by displaced_step_prepare_throw to check if the displaced
stepping buffer is free to use or not.
This patch defers the whole task of preparing and cleaning up after a
displaced step to the gdbarch. Two new main gdbarch methods are added,
with the following semantics:
- gdbarch_displaced_step_prepare: Prepare for the given thread to
execute a displaced step of the instruction located at its current PC.
Upon return, everything should be ready for GDB to resume the thread
(with either a single step or continue, as indicated by
gdbarch_displaced_step_hw_singlestep) to make it displaced step the
instruction.
- gdbarch_displaced_step_finish: Called when the thread stopped after
having started a displaced step. Verify if the instruction was
executed, if so apply any fixup required to compensate for the fact
that the instruction was executed at a different place than its
original pc. Release any resources that were allocated for this
displaced step. Upon return, everything should be ready for GDB to
resume the thread in its "normal" code path.
The displaced_step_prepare_throw function now pretty much just offloads
to gdbarch_displaced_step_prepare and the displaced_step_finish function
offloads to gdbarch_displaced_step_finish.
The gdbarch_displaced_step_location method is now unnecessary, so is
removed. Indeed, the core of GDB doesn't know how many displaced step
buffers there are nor where they are.
To keep the existing behavior for existing architectures, the logic that
was previously implemented in infrun.c for preparing and finishing a
displaced step is moved to displaced-stepping.c, to the
displaced_step_buffer class. Architectures are modified to implement
the new gdbarch methods using this class. The behavior is not expected
to change.
The other important change (which arises from the above) is that the
core of GDB no longer prevents concurrent displaced steps. Before this
patch, start_step_over walks the global step over chain and tries to
initiate a step over (whether it is in-line or displaced). It follows
these rules:
- if an in-line step is in progress (in any inferior), don't start any
other step over
- if a displaced step is in progress for an inferior, don't start
another displaced step for that inferior
After starting a displaced step for a given inferior, it won't start
another displaced step for that inferior.
In the new code, start_step_over simply tries to initiate step overs for
all the threads in the list. But because threads may be added back to
the global list as it iterates the global list, trying to initiate step
overs, start_step_over now starts by stealing the global queue into a
local queue and iterates on the local queue. In the typical case, each
thread will either:
- have initiated a displaced step and be resumed
- have been added back by the global step over queue by
displaced_step_prepare_throw, because the gdbarch will have returned
that there aren't enough resources (i.e. buffers) to initiate a
displaced step for that thread
Lastly, if start_step_over initiates an in-line step, it stops
iterating, and moves back whatever remaining threads it had in its local
step over queue to the global step over queue.
Two other gdbarch methods are added, to handle some slightly annoying
corner cases. They feel awkwardly specific to these cases, but I don't
see any way around them:
- gdbarch_displaced_step_copy_insn_closure_by_addr: in
arm_pc_is_thumb, arm-tdep.c wants to get the closure for a given
buffer address.
- gdbarch_displaced_step_restore_all_in_ptid: when a process forks
(at least on Linux), the address space is copied. If some displaced
step buffers were in use at the time of the fork, we need to restore
the original bytes in the child's address space.
These two adjustments are also made in infrun.c:
- prepare_for_detach: there may be multiple threads doing displaced
steps when we detach, so wait until all of them are done
- handle_inferior_event: when we handle a fork event for a given
thread, it's possible that other threads are doing a displaced step at
the same time. Make sure to restore the displaced step buffer
contents in the child for them.
[1] https://github.com/ROCm-Developer-Tools/ROCgdb
gdb/ChangeLog:
* displaced-stepping.h (struct
displaced_step_copy_insn_closure): Adjust comments.
(struct displaced_step_inferior_state) <step_thread,
step_gdbarch, step_closure, step_original, step_copy,
step_saved_copy>: Remove fields.
(struct displaced_step_thread_state): New.
(struct displaced_step_buffer): New.
* displaced-stepping.c (displaced_step_buffer::prepare): New.
(write_memory_ptid): Move from infrun.c.
(displaced_step_instruction_executed_successfully): New,
factored out of displaced_step_finish.
(displaced_step_buffer::finish): New.
(displaced_step_buffer::copy_insn_closure_by_addr): New.
(displaced_step_buffer::restore_in_ptid): New.
* gdbarch.sh (displaced_step_location): Remove.
(displaced_step_prepare, displaced_step_finish,
displaced_step_copy_insn_closure_by_addr,
displaced_step_restore_all_in_ptid): New.
* gdbarch.c: Re-generate.
* gdbarch.h: Re-generate.
* gdbthread.h (class thread_info) <displaced_step_state>: New
field.
(thread_step_over_chain_remove): New declaration.
(thread_step_over_chain_next): New declaration.
(thread_step_over_chain_length): New declaration.
* thread.c (thread_step_over_chain_remove): Make non-static.
(thread_step_over_chain_next): New.
(global_thread_step_over_chain_next): Use
thread_step_over_chain_next.
(thread_step_over_chain_length): New.
(global_thread_step_over_chain_enqueue): Add debug print.
(global_thread_step_over_chain_remove): Add debug print.
* infrun.h (get_displaced_step_copy_insn_closure_by_addr):
Remove.
* infrun.c (get_displaced_stepping_state): New.
(displaced_step_in_progress_any_inferior): Remove.
(displaced_step_in_progress_thread): Adjust.
(displaced_step_in_progress): Adjust.
(displaced_step_in_progress_any_thread): New.
(get_displaced_step_copy_insn_closure_by_addr): Remove.
(gdbarch_supports_displaced_stepping): Use
gdbarch_displaced_step_prepare_p.
(displaced_step_reset): Change parameter from inferior to
thread.
(displaced_step_prepare_throw): Implement using
gdbarch_displaced_step_prepare.
(write_memory_ptid): Move to displaced-step.c.
(displaced_step_restore): Remove.
(displaced_step_finish): Implement using
gdbarch_displaced_step_finish.
(start_step_over): Allow starting more than one displaced step.
(prepare_for_detach): Handle possibly multiple threads doing
displaced steps.
(handle_inferior_event): Handle possibility that fork event
happens while another thread displaced steps.
* linux-tdep.h (linux_displaced_step_prepare): New.
(linux_displaced_step_finish): New.
(linux_displaced_step_copy_insn_closure_by_addr): New.
(linux_displaced_step_restore_all_in_ptid): New.
(linux_init_abi): Add supports_displaced_step parameter.
* linux-tdep.c (struct linux_info) <disp_step_buf>: New field.
(linux_displaced_step_prepare): New.
(linux_displaced_step_finish): New.
(linux_displaced_step_copy_insn_closure_by_addr): New.
(linux_displaced_step_restore_all_in_ptid): New.
(linux_init_abi): Add supports_displaced_step parameter,
register displaced step methods if true.
(_initialize_linux_tdep): Register inferior_execd observer.
* amd64-linux-tdep.c (amd64_linux_init_abi_common): Add
supports_displaced_step parameter, adjust call to
linux_init_abi. Remove call to
set_gdbarch_displaced_step_location.
(amd64_linux_init_abi): Adjust call to
amd64_linux_init_abi_common.
(amd64_x32_linux_init_abi): Likewise.
* aarch64-linux-tdep.c (aarch64_linux_init_abi): Adjust call to
linux_init_abi. Remove call to
set_gdbarch_displaced_step_location.
* arm-linux-tdep.c (arm_linux_init_abi): Likewise.
* i386-linux-tdep.c (i386_linux_init_abi): Likewise.
* alpha-linux-tdep.c (alpha_linux_init_abi): Adjust call to
linux_init_abi.
* arc-linux-tdep.c (arc_linux_init_osabi): Likewise.
* bfin-linux-tdep.c (bfin_linux_init_abi): Likewise.
* cris-linux-tdep.c (cris_linux_init_abi): Likewise.
* csky-linux-tdep.c (csky_linux_init_abi): Likewise.
* frv-linux-tdep.c (frv_linux_init_abi): Likewise.
* hppa-linux-tdep.c (hppa_linux_init_abi): Likewise.
* ia64-linux-tdep.c (ia64_linux_init_abi): Likewise.
* m32r-linux-tdep.c (m32r_linux_init_abi): Likewise.
* m68k-linux-tdep.c (m68k_linux_init_abi): Likewise.
* microblaze-linux-tdep.c (microblaze_linux_init_abi): Likewise.
* mips-linux-tdep.c (mips_linux_init_abi): Likewise.
* mn10300-linux-tdep.c (am33_linux_init_osabi): Likewise.
* nios2-linux-tdep.c (nios2_linux_init_abi): Likewise.
* or1k-linux-tdep.c (or1k_linux_init_abi): Likewise.
* riscv-linux-tdep.c (riscv_linux_init_abi): Likewise.
* s390-linux-tdep.c (s390_linux_init_abi_any): Likewise.
* sh-linux-tdep.c (sh_linux_init_abi): Likewise.
* sparc-linux-tdep.c (sparc32_linux_init_abi): Likewise.
* sparc64-linux-tdep.c (sparc64_linux_init_abi): Likewise.
* tic6x-linux-tdep.c (tic6x_uclinux_init_abi): Likewise.
* tilegx-linux-tdep.c (tilegx_linux_init_abi): Likewise.
* xtensa-linux-tdep.c (xtensa_linux_init_abi): Likewise.
* ppc-linux-tdep.c (ppc_linux_init_abi): Adjust call to
linux_init_abi. Remove call to
set_gdbarch_displaced_step_location.
* arm-tdep.c (arm_pc_is_thumb): Call
gdbarch_displaced_step_copy_insn_closure_by_addr instead of
get_displaced_step_copy_insn_closure_by_addr.
* rs6000-aix-tdep.c (rs6000_aix_init_osabi): Adjust calls to
clear gdbarch methods.
* rs6000-tdep.c (struct ppc_inferior_data): New structure.
(get_ppc_per_inferior): New function.
(ppc_displaced_step_prepare): New function.
(ppc_displaced_step_finish): New function.
(ppc_displaced_step_restore_all_in_ptid): New function.
(rs6000_gdbarch_init): Register new gdbarch methods.
* s390-tdep.c (s390_gdbarch_init): Don't call
set_gdbarch_displaced_step_location, set new gdbarch methods.
gdb/testsuite/ChangeLog:
* gdb.arch/amd64-disp-step-avx.exp: Adjust pattern.
* gdb.threads/forking-threads-plus-breakpoint.exp: Likewise.
* gdb.threads/non-stop-fair-events.exp: Likewise.
Change-Id: I387cd235a442d0620ec43608fd3dc0097fcbf8c8
|
|
When a process does an exec, all its program space is replaced with the
newly loaded executable. All non-main threads disappear and the main
thread starts executing at the entry point of the new executable.
Things can go wrong if a displaced step operation is in progress while
we process the exec event.
If the main thread is the one executing the displaced step: when that
thread (now executing in the new executable) stops somewhere (say, at a
breakpoint), displaced_step_fixup will run and clear up the state. We
will execute the "fixup" phase for the instruction we single-stepped in
the old program space. We are now in a completely different context,
so doing the fixup may corrupt the state.
If it is a non-main thread that is doing the displaced step: while
handling the exec event, GDB deletes the thread_info representing that
thread (since the thread doesn't exist in the inferior after the exec).
But inferior::displaced_step_state::step_thread will still point to it.
When handling events later, this condition, in displaced_step_fixup,
will likely never be true:
/* Was this event for the thread we displaced? */
if (displaced->step_thread != event_thread)
return 0;
... since displaced->step_thread points to a deleted thread (unless that
storage gets re-used for a new thread_info, but that wouldn't be good
either). This effectively makes the displaced stepping buffer occupied
for ever. When a thread in the new program space will want to do a
displaced step, it will wait for ever.
I think we simply need to reset the displaced stepping state of the
inferior on exec. Everything execution-related that existed before the
exec is now gone.
Similarly, if a thread does an in-line step over an exec syscall
instruction, nothing clears the in-line step over info when the event is
handled. So it the in-line step over info stays there indefinitely, and
things hang because we can never start another step over. To fix this,
I added a call to clear_step_over_info in infrun_inferior_execd.
Add a test with a program with two threads that does an exec. The test
includes the following axes:
- whether it's the leader thread or the other thread that does the exec.
- whether the exec'r and exec'd program have different text segment
addresses. This is to hopefully catch cases where the displaced
stepping info doesn't get reset, and GDB later tries to restore bytes
of the old address space in the new address space. If the mapped
addresses are different, we should get some memory error. This
happens without the patch applied:
$ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-leader-diff-text-segs-true -ex "b main" -ex r -ex "b my_execve_syscall if 0" -ex "set displaced-stepping on"
...
Breakpoint 1, main (argc=1, argv=0x7fffffffde38) at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.threads/step-over-exec.c:69
69 argv0 = argv[0];
Breakpoint 2 at 0x60133a: file /home/simark/src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 34.
(gdb) c
Continuing.
[New Thread 0x7ffff7c62640 (LWP 1455423)]
Leader going in exec.
Exec-ing /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-leader-diff-text-segs-true-execd
[Thread 0x7ffff7c62640 (LWP 1455423) exited]
process 1455418 is executing new program: /home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-leader-diff-text-segs-true-execd
Error in re-setting breakpoint 2: Function "my_execve_syscall" not defined.
No unwaited-for children left.
(gdb) n
Single stepping until exit from function _start,
which has no line number information.
Cannot access memory at address 0x6010d2
(gdb)
- Whether displaced stepping is allowed or not, so that we end up
testing both displaced stepping and in-line stepping on arches that do
support displaced stepping (otherwise, it just tests in-line stepping
twice I suppose)
To be able to precisely put a breakpoint on the syscall instruction, I
added a small assembly file (lib/my-syscalls.S) that contains minimal
Linux syscall wrappers. I prefer that to the strategy used in
gdb.base/step-over-syscall.exp, which is to stepi into the glibc wrapper
until we find something that looks like a syscall instruction, I find
that more predictable.
gdb/ChangeLog:
* infrun.c (infrun_inferior_execd): New function.
(_initialize_infrun): Attach inferior_execd observer.
gdb/testsuite/ChangeLog:
* gdb.threads/step-over-exec.exp: New.
* gdb.threads/step-over-exec.c: New.
* gdb.threads/step-over-exec-execd.c: New.
* lib/my-syscalls.S: New.
* lib/my-syscalls.h: New.
Change-Id: I1bbc8538e683f53af5b980091849086f4fec5ff9
|
|
gdb/testsuite/ChangeLog:
* gdb.threads/non-ldr-exc-1.exp: Fix indentation.
Change-Id: I02ba8a518aae9cb67106d09bef92968a7078e91e
|