Age | Commit message (Collapse) | Author | Files | Lines |
|
When running test-case gdb.base/watchpoint-running on ppc64le-linux (and
similar on arm-linux), we get:
...
(gdb) watch global_var^M
warning: Error when detecting the debug register interface. \
Debug registers will be unavailable.^M
Watchpoint 2: global_var^M
(gdb) FAIL: $exp: all-stop: hardware: watch global_var
FAIL: $exp: all-stop: hardware: watchpoint hit (timeout)
...
The problem is that ppc_linux_dreg_interface::detect fails to detect the
hardware watchpoint interface, because the calls to ptrace return with errno
set to ESRCH.
This is a feature of ptrace: if a call is done while the tracee is not
ptrace-stopped, it returns ESRCH.
Indeed, in the test-case "watch global_var" is executed while the inferior is
running, and that triggers the first call to ppc_linux_dreg_interface::detect.
And because the detection failure is cached, subsequent attempts at setting
hardware watchpoints will also fail, even if the tracee is ptrace-stopped.
The way to fix this is to make sure that ppc_linux_dreg_interface::detect is
called when we know that the thread is ptrace-stopped, which in the current
setup is best addressed by using target-specific post_attach and
post_startup_inferior overrides. However, as we can see in
aarch64_linux_nat_target, that causes code duplication.
Fix this by:
- defining a new target hook low_init_process, called from
linux_init_ptrace_procfs, which is called from both
linux_nat_target::post_attach and linux_nat_target::post_startup_inferior,
- adding implementations for ppc_linux_nat_target and arm_linux_nat_target
that detect the hardware watchpoint interface,
- replacing the aarch64_linux_nat_target implementations of post_attach and
post_startup_inferior with a low_init_process implementation.
Tested on ppc64le-linux, arm-linux, aarch64-linux and x86_64-linux.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Approved-By: Luis Machado <luis.machado@arm.com>
PR tdep/31834
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31834
PR tdep/31705
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31705
|
|
In a later commit I want to access have_ptrace_getregset from a .c
file in the nat/ directory. To achieve this I need access to the
declaration of have_ptrace_getregset.
Currently have_ptrace_getregset is declared (and defined) twice, once
in GDB and once in gdbserver.
This commit moves the declaration into nat/linux-nat.h, but leaves the
two definitions where they are. Now, in my later commit, I can pull
in the declaration from nat/linux-nat.h.
There should be no user visible changes after this commit.
Approved-By: Felix Willgerodt <felix.willgerodt@intel.com>
|
|
This changes target_ops::thread_events and target_thread_events to use
'bool'. The callers were already doing this.
Tested by rebuilding.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
|
|
Old RHEL systems have a kernel that does not support writing memory
via /proc/pid/mem. On such systems, we fallback to accessing memory
via ptrace. That has a few downsides described in the "Accessing
inferior memory" section at the top of linux-nat.c.
The target_xfer interface for memory access uses inferior_ptid as
sideband argument to indicate which process to access. Memory access
is process-wide, it is not thread-specific, so inferior_ptid is
sometimes pointed at a process-wide ptid_t for the memory access
(i.e., a ptid that looks like {pid, 0, 0}). That is the case for any
code that uses scoped_restore_current_inferior_for_memory, for
example.
That is what causes the issue described in PR 31579, where thread_db
calls into the debugger to read memory, which reaches our
ps_xfer_memory function, which does:
static ps_err_e
ps_xfer_memory (const struct ps_prochandle *ph, psaddr_t addr,
gdb_byte *buf, size_t len, int write)
{
scoped_restore_current_inferior_for_memory save_inferior (ph->thread->inf);
...
ret = target_read_memory (core_addr, buf, len);
...
}
If linux_nat_target::xfer_partial falls back to inf_ptrace_target with
a pid-ptid, then the ptrace code will do the ptrace call targeting
pid, the leader LWP. That may fail with ESRCH if the leader is
currently running, or zombie. That is the case in the scenario in
question, because thread_db is consulted for an event of a non-leader
thread, before we've stopped the whole process.
Fix this by having the ptrace fallback code try to find a stopped LWP
to use with ptrace.
I chose to handle this in the linux-nat target instead of in common
code because (global) memory is a process-wide property, and this
avoids having to teach all the code paths that use
scoped_restore_current_inferior_for_memory to find some stopped thread
to access memory through, which is a ptrace quirk. That is
effectively what we used to do before we started relying on writable
/proc/pid/mem. I'd rather not go back there.
To trigger this on modern kernels you have to hack linux-nat.c to
force the ptrace fallback code, like so:
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -3921,7 +3921,7 @@ linux_nat_target::xfer_partial (enum target_object object,
poke would incorrectly write memory to the post-exec address
space, while the core was trying to write to the pre-exec
address space. */
- if (proc_mem_file_is_writable ())
+ if (0 && proc_mem_file_is_writable ())
With that hack, I was able to confirm that the fix fixes hundreds of
testsuite failures. Compared to a test run with pristine master, the
hack above + this commit's fix shows that some non-stop-related tests
fail, but that is expected, because those are tests that need to
access memory while the program is running. (I made no effort to
temporarily pause an lwp if no ptrace-stopped lwp is found.)
Change-Id: I24a4f558e248aff7bc7c514a88c698f379f23180
Tested-By: Hannes Domani <ssbssa@yahoo.de>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
Most files including gdbcmd.h currently rely on it to access things
actually declared in cli/cli-cmds.h (setlist, showlist, etc). To make
things easy, replace all includes of gdbcmd.h with includes of
cli/cli-cmds.h. This might lead to some unused includes of
cli/cli-cmds.h, but it's harmless, and much faster than going through
the 170 or so files by hand.
Change-Id: I11f884d4d616c12c05f395c98bbc2892950fb00f
Approved-By: Tom Tromey <tom@tromey.com>
|
|
It's been over 9 years (since commit faf09f0119da) since Linux GDB and
GDBserver started relying on SIGTRAP si_code to tell whether a
breakpoint triggered, which is important for non-stop mode. When that
then-new code was added, I had left the then-old code as fallback, in
case some architectured still needed it. Given AFAIK there haven't
been complaints since, this commit finally removes the fallback code,
along with USE_SIGTRAP_SIGINFO.
Change-Id: I140a5333a9fe70e90dbd186aca1f081549b2e63d
|
|
I've noticed that doc strings of some commands, like "set cwd"
and "set inferior-tty", have some excess whitespace, which
makes them display with unexpected indentation, at least in a
Windows command prompt window. This patch fixes that.
* gdb/linux-nat.c (_initialize_linux_nat):
* gdb/riscv-tdep.c (riscv_insn):
* gdb/top.c (quit_force):
* gdb/infcmd.c (_initialize_infcmd): Remove excess whitespace.
|
|
Now that defs.h, server.h and common-defs.h are included via the
`-include` option, it is no longer necessary for source files to include
them. Remove all the inclusions of these files I could find. Update
the generation scripts where relevant.
Change-Id: Ia026cff269c1b7ae7386dd3619bc9bb6a5332837
Approved-By: Pedro Alves <pedro@palves.net>
|
|
PR gdb/31259 reveals one scenario where we run into a
heap-use-after-free reported by thread sanitizer, while running
gdb.base/vfork-follow-parent.exp.
The heap-use-after-free happens during the following scenario:
- linux_nat_wait_1 is about to return an event for T2. It stops all
other threads, and while doing so, stop_wait_callback -> wait_lwp
sees T1 exit, and decides to leave the exit event pending. It
should have set the lp->stopped flag too, but does not -- this is
the bug.
- The event for T2 is reported, is processed by infrun, and we're
back at linux_nat_wait_1.
- linux_nat_wait_1 selects LWP T1 with the pending exit status to
report.
- it sets variable lp to point to the corresponding lwp_info.
- it calls stop_callback and stop_wait_callback for all threads
(because !target_is_non_stop_p ()).
- it calls select_event_lwp to maybe pick another thread than T1, to
prevent starvation.
The problem is the following:
- while calling stop_wait_callback for all threads, it also does this
for T1. While doing so, the corresponding lwp_info is deleted
(callstack stop_wait_callback -> wait_lwp -> exit_lwp ->
delete_lwp), leaving variable lp as a dangling pointer.
- variable lp is passed to select_event_lwp, which derefences it,
which causes the heap-use-after-free.
Note that the comment here mentions "all other LWP's":
...
/* Now stop all other LWP's ... */
iterate_over_lwps (minus_one_ptid, stop_callback);
/* ... and wait until all of them have reported back that
they're no longer running. */
iterate_over_lwps (minus_one_ptid, stop_wait_callback);
...
The reason the comments say "all other LWP's", and doesn't bother
filtering out LP is that lp->stopped should be true at this point, and
the callbacks (both stop_callback and stop_wait_callback) check that
flag, and do nothing if set. I.e., they skip already-stopped threads,
so they should skip LP.
In this particular scenario, though, we missed setting the stopped
flag right in the first step described above, so LP was iterated over
incorrectly.
The fix is to make wait_lwp set the lp->stopped flag when it decides
to leave the exit event pending. However, going a bit further,
gdbserver has a mark_lwp_dead function to centralize setting up
various lwp flags such that the rest of the code doesn't mishandle
them, and it seems like a good idea to do a similar thing in gdb as
well. That is what this patch does.
PR gdb/31259
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Change-Id: I4a6169976f89bf714c478cbb2b7d4c32365e62a9
|
|
This commit adds a new "Accessing inferior memory" comment section to
gdb/linux-nat.c that explains why we prefer /proc/pid/mem over
alternatives.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30453
Change-Id: I575b21ed697a85f3ff4c0ec58c04812db5005b76
|
|
This commit is the result of the following actions:
- Running gdb/copyright.py to update all of the copyright headers to
include 2024,
- Manually updating a few files the copyright.py script told me to
update, these files had copyright headers embedded within the
file,
- Regenerating gdbsupport/Makefile.in to refresh it's copyright
date,
- Using grep to find other files that still mentioned 2023. If
these files were updated last year from 2022 to 2023 then I've
updated them this year to 2024.
I'm sure I've probably missed some dates. Feel free to fix them up as
you spot them.
|
|
When using GDB on native linux, it can happen that, while attempting
to detach an inferior, the inferior may have been exited or have been
killed, yet still be in the list of lwps. Should that happen, the
assert in x86_linux_update_debug_registers in
gdb/nat/x86-linux-dregs.c will trigger. The line in question looks
like this:
gdb_assert (lwp_is_stopped (lwp));
For this case, the lwp isn't stopped - it's dead.
The bug which brought this problem to my attention is one in which the
pwntools library uses GDB to to debug a process; as the script is
shutting things down, it kills the process that GDB is debugging and
also sends GDB a SIGTERM signal, which causes GDB to detach all
inferiors prior to exiting. Here's a link to the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2192169
The following shell command mimics part of what the pwntools
reproducer script does (with regard to shutting things down), but
reproduces the bug much less reliably. I have found it necessary to
run the command a bunch of times before seeing the bug. (I usually
see it within 5-10 repetitions.) If you choose to try this command,
make sure that you have no running "cat" or "gdb" processes first!
cat </dev/zero >/dev/null & \
(sleep 5; (kill -KILL `pgrep cat` & kill -TERM `pgrep gdb`)) & \
sleep 1 ; \
gdb -q -iex 'set debuginfod enabled off' -ex 'set height 0' \
-ex c /usr/bin/cat `pgrep cat`
So, basically, the idea here is to kill both gdb and cat at roughly
the same time. If we happen to attempt the detach before the process
lwp has been deleted from GDB's (linux native) LWP data structures,
then the assert will trigger. The relevant part of the backtrace
looks like this:
#8 0x00000000008a83ae in x86_linux_update_debug_registers (lwp=0x1873280)
at gdb/nat/x86-linux-dregs.c:146
#9 0x00000000008a862f in x86_linux_prepare_to_resume (lwp=0x1873280)
at gdb/nat/x86-linux.c:81
#10 0x000000000048ea42 in x86_linux_nat_target::low_prepare_to_resume (
this=0x121eee0 <the_amd64_linux_nat_target>, lwp=0x1873280)
at gdb/x86-linux-nat.h:70
#11 0x000000000081a452 in detach_one_lwp (lp=0x1873280, signo_p=0x7fff8ca3441c)
at gdb/linux-nat.c:1374
#12 0x000000000081a85f in linux_nat_target::detach (
this=0x121eee0 <the_amd64_linux_nat_target>, inf=0x16e8f70, from_tty=0)
at gdb/linux-nat.c:1450
#13 0x000000000083a23b in thread_db_target::detach (
this=0x1206ae0 <the_thread_db_target>, inf=0x16e8f70, from_tty=0)
at gdb/linux-thread-db.c:1385
#14 0x0000000000a66722 in target_detach (inf=0x16e8f70, from_tty=0)
at gdb/target.c:2526
#15 0x0000000000a8f0ad in kill_or_detach (inf=0x16e8f70, from_tty=0)
at gdb/top.c:1659
#16 0x0000000000a8f4fa in quit_force (exit_arg=0x0, from_tty=0)
at gdb/top.c:1762
#17 0x000000000070829c in async_sigterm_handler (arg=0x0)
at gdb/event-top.c:1141
My colleague, Andrew Burgess, has done some recent work on other
problems with detach. Upon hearing of this problem, he came up a test
case which reliably reproduces the problem and tests for a few other
problems as well. In addition to testing detach when the inferior has
terminated due to a signal, it also tests detach when the inferior has
exited normally. Andrew observed that the linux-native-only
"checkpoint" command would be affected too, so the test also tests
those cases when there's an active checkpoint.
For the LWP exit / termination case with no checkpoint, that's handled
via newly added checks of the waitstatus in detach_one_lwp in
linux-nat.c.
For the checkpoint detach problem, I chose to pass the lwp_info
to linux_fork_detach in linux-fork.c. With that in place, suitable
tests were added before attempting a PTRACE_DETACH operation.
I added a few asserts at the beginning of linux_fork_detach and
modified the caller code so that the newly added asserts shouldn't
trigger. (That's what the 'pid == inferior_ptid.pid' check is about
in gdb/linux-nat.c.)
Lastly, I'll note that the checkpoint code needs some work with regard
to background execution. This patch doesn't attempt to fix that
problem, but it doesn't make it any worse. It does slightly improve
the situation with detach because, due to the check noted above,
linux_fork_detach() won't be called for the wrong inferior when there
are multiple inferiors. (There are at least two other problems with
the checkpoint code when there are multiple inferiors. See:
https://sourceware.org/bugzilla/show_bug.cgi?id=31065)
This commit also adds a new test,
gdb.base/process-dies-while-detaching.exp. Andrew Burgess is the
primary author of this test case. Its design is similar to that of
gdb.threads/main-thread-exit-during-detach.exp, which was also written
by Andrew.
This test checks that GDB correctly handles several cases that can
occur when GDB attempts to detach an inferior process. The process
can exit or be terminated (e.g. via SIGKILL) prior to GDB's event
loop getting a chance to remove it from GDB's internal data
structures. To complicate things even more, detach works differently
when a checkpoint (created via GDB's "checkpoint" command) exists for
the inferior. This test checks all four possibilities: process exit
with no checkpoint, process termination with no checkpoint, process
exit with a checkpoint, and process termination with a checkpoint.
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
On Linux, threads are treated much like separate processes by the
kernel. In particular, it's possible to ptrace just a single thread.
If gdb tries to attach to a multi-threaded inferior, where a non-main
thread is already being traced (e.g., by strace), then gdb will get
into an infinite loop attempting to attach.
This patch fixes this problem by having the attach fail if ptrace
fails to attach to any thread of the inferior.
|
|
When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
on s390x), I run into:
...
(gdb) PASS: gdb.base/vfork-follow-parent.exp: \
exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
resolution_method=schedule-multiple: print unblock_parent = 1
continue^M
Continuing.^M
Reading symbols from vfork-follow-parent-exit...^M
^M
^M
Fatal signal: Segmentation fault^M
----- Backtrace -----^M
0x1027d3e7 gdb_internal_backtrace_1^M
src/gdb/bt-utils.c:122^M
0x1027d54f _Z22gdb_internal_backtracev^M
src/gdb/bt-utils.c:168^M
0x1057643f handle_fatal_signal^M
src/gdb/event-top.c:889^M
0x10576677 handle_sigsegv^M
src/gdb/event-top.c:962^M
0x3fffa7610477 ???^M
0x103f2144 for_each_block^M
src/gdb/dcache.c:199^M
0x103f235b _Z17dcache_invalidateP13dcache_struct^M
src/gdb/dcache.c:251^M
0x10bde8c7 _Z24target_dcache_invalidatev^M
src/gdb/target-dcache.c:50^M
...
or similar.
The root cause for the segmentation fault is that linux_is_uclinux gives an
incorrect result: it should always return false, given that we're running on a
regular linux system, but instead it returns first true, then false.
In more detail, the segmentation fault happens as follows:
- a program space with an address space is created
- a second program space is about to be created. maybe_new_address_space
is called, and because linux_is_uclinux returns true, maybe_new_address_space
returns false, and no new address space is created
- a second program space with the same address space is created
- a program space is deleted. Because linux_is_uclinux now returns false,
gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
false, and the address space is deleted
- when gdb uses the address space of the remaining program space, we run into
the segfault, because the address space is deleted.
Hardcoding linux_is_uclinux to false makes the test-case pass.
We leave addressing the root cause for the following commit in this series.
For now, prevent the segmentation fault by making the address space a refcounted
object.
This was already suggested here [1]:
...
A better solution might be to have the address spaces be reference counted
...
Tested on top of trunk on x86_64-linux and ppc64le-linux.
Tested on top of gdb-14-branch on ppc64-linux.
Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca>
PR gdb/30547
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
[1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
|
|
Since GDB now requires C++17, we don't need the internally maintained
gdb::optional implementation. This patch does the following replacing:
- gdb::optional -> std::optional
- gdb::in_place -> std::in_place
- #include "gdbsupport/gdb_optional.h" -> #include <optional>
This change has mostly been done automatically. One exception is
gdbsupport/thread-pool.* which did not use the gdb:: prefix as it
already lives in the gdb namespace.
Change-Id: I19a92fa03e89637bab136c72e34fd351524f65e9
Approved-By: Tom Tromey <tom@tromey.com>
Approved-By: Pedro Alves <pedro@palves.net>
|
|
While looking at the regcache code, I noticed that the address space
(passed to regcache when constructing it, and available through
regcache::aspace) wasn't relevant for the regcache itself. Callers of
regcache::aspace use that method because it appears to be a convenient
way of getting the address space for a thread, if you already have the
regcache. But there is always another way to get the address space, as
the callers pretty much always know which thread they are dealing with.
The regcache code itself doesn't use the address space.
This patch removes anything related to address_space from the regcache
code, and updates callers to get it from the thread in context. This
removes a bit of unnecessary complexity from the regcache code.
The current get_thread_arch_regcache function gets an address_space for
the given thread using the target_thread_address_space function (which
calls the target_ops::thread_address_space method). This suggest that
there might have been the intention of supporting per-thread address
spaces. But digging through the history, I did not find any such case.
Maybe this method was just added because we needed a way to get an
address space from a ptid (because constructing a regcache required an
address space), and this seemed like the right way to do it, I don't
know.
The only implementations of thread_address_space and
process_stratum_target::thread_address_space and
linux_nat_target::thread_address_space, which essentially just return
the inferior's address space. And thread_address_space is only used in
the current get_thread_arch_regcache, which gets removed. So, I think
that the thread_address_space target method can be removed, and we can
assume that it's fine to use the inferior's address space everywhere.
Callers of regcache::aspace are updated to get the address space from
the relevant inferior, either using some context they already know
about, or in last resort using the current global context.
So, to summarize:
- remove everything in regcache related to address spaces
- in particular, remove get_thread_arch_regcache, and rename
get_thread_arch_aspace_regcache to get_thread_arch_regcache
- remove target_ops::thread_address_space, and
target_thread_address_space
- adjust all users of regcache::aspace to get the address space another
way
Change-Id: I04fd41b22c83fe486522af7851c75bcfb31c88c7
|
|
This implements support for the new GDB_THREAD_OPTION_EXIT thread
option for native Linux.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: Ia69fc0b9b96f9af7de7cefc1ddb1fba9bbb0bb90
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
|
|
Currently, infrun assumes that when TARGET_WAITKIND_THREAD_EXITED is
reported, the corresponding GDB thread has already been removed from
the GDB thread list.
Later in the series, that will no longer work, as infrun will need to
refer to the thread's thread_info when it processes
TARGET_WAITKIND_THREAD_EXITED.
As preparation, this patch makes deleting the GDB thread
responsibility of infrun, instead of the target.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I013d87f61ffc9aaca49f0d6ce2a43e3ea69274de
|
|
This commit teaches the native Linux target about the
GDB_THREAD_OPTION_CLONE thread option. It's actually simpler to just
continue reporting all clone events unconditionally to the core.
There's never any harm in reporting a clone event when the option is
disabled. All we need to do is to report support for the option,
otherwise GDB falls back to use target_thread_events().
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: If90316e2dcd0c61d0fefa0d463c046011698acf9
|
|
(A good chunk of the problem statement in the commit log below is
Andrew's, adjusted for a different solution, and for covering
displaced stepping too. The testcase is mostly Andrew's too.)
This commit addresses bugs gdb/19675 and gdb/27830, which are about
stepping over a breakpoint set at a clone syscall instruction, one is
about displaced stepping, and the other about in-line stepping.
Currently, when a new thread is created through a clone syscall, GDB
sets the new thread running. With 'continue' this makes sense
(assuming no schedlock):
- all-stop mode, user issues 'continue', all threads are set running,
a newly created thread should also be set running.
- non-stop mode, user issues 'continue', other pre-existing threads
are not affected, but as the new thread is (sort-of) a child of the
thread the user asked to run, it makes sense that the new threads
should be created in the running state.
Similarly, if we are stopped at the clone syscall, and there's no
software breakpoint at this address, then the current behaviour is
fine:
- all-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). While stepping the thread
of interest all the other threads will be allowed to continue. A
newly created thread will be set running, and then stopped once the
thread of interest has completed its step.
- non-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). Other threads might be
running or stopped, but as with the continue case above, the new
thread will be created running. The only possible issue here is
that the new thread will be left running after the initial thread
has completed its stepi. The user would need to manually select
the thread and interrupt it, this might not be what the user
expects. However, this is not something this commit tries to
change.
The problem then is what happens when we try to step over a clone
syscall if there is a breakpoint at the syscall address.
- For both all-stop and non-stop modes, with in-line stepping:
+ user issues 'stepi',
+ [non-stop mode only] GDB stops all threads. In all-stop mode all
threads are already stopped.
+ GDB removes s/w breakpoint at syscall address,
+ GDB single steps just the thread of interest, all other threads
are left stopped,
+ New thread is created running,
+ Initial thread completes its step,
+ [non-stop mode only] GDB resumes all threads that it previously
stopped.
There are two problems in the in-line stepping scenario above:
1. The new thread might pass through the same code that the initial
thread is in (i.e. the clone syscall code), in which case it will
fail to hit the breakpoint in clone as this was removed so the
first thread can single step,
2. The new thread might trigger some other stop event before the
initial thread reports its step completion. If this happens we
end up triggering an assertion as GDB assumes that only the
thread being stepped should stop. The assert looks like this:
infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
- For both all-stop and non-stop modes, with displaced stepping:
+ user issues 'stepi',
+ GDB starts the displaced step, moves thread's PC to the
out-of-line scratch pad, maybe adjusts registers,
+ GDB single steps the thread of interest, [non-stop mode only] all
other threads are left as they were, either running or stopped.
In all-stop, all other threads are left stopped.
+ New thread is created running,
+ Initial thread completes its step, GDB re-adjusts its PC,
restores/releases scratchpad,
+ [non-stop mode only] GDB resumes the thread, now past its
breakpoint.
+ [all-stop mode only] GDB resumes all threads.
There is one problem with the displaced stepping scenario above:
3. When the parent thread completed its step, GDB adjusted its PC,
but did not adjust the child's PC, thus that new child thread
will continue execution in the scratch pad, invoking undefined
behavior. If you're lucky, you see a crash. If unlucky, the
inferior gets silently corrupted.
What is needed is for GDB to have more control over whether the new
thread is created running or not. Issue #1 above requires that the
new thread not be allowed to run until the breakpoint has been
reinserted. The only way to guarantee this is if the new thread is
held in a stopped state until the single step has completed. Issue #3
above requires that GDB is informed of when a thread clones itself,
and of what is the child's ptid, so that GDB can fixup both the parent
and the child.
When looking for solutions to this problem I considered how GDB
handles fork/vfork as these have some of the same issues. The main
difference between fork/vfork and clone is that the clone events are
not reported back to core GDB. Instead, the clone event is handled
automatically in the target code and the child thread is immediately
set running.
Note we have support for requesting thread creation events out of the
target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported
for the new/child thread. That would be sufficient to address in-line
stepping (issue #1), but not for displaced-stepping (issue #3). To
handle displaced-stepping, we need an event that is reported to the
_parent_ of the clone, as the information about the displaced step is
associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED
includes no indication of which thread is the parent that spawned the
new child. In fact, for some targets, like e.g., Windows, it would be
impossible to know which thread that was, as thread creation there
doesn't work by "cloning".
The solution implemented here is to model clone on fork/vfork, and
introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is
similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except
that we end up with a new thread in the same process, instead of a new
thread of a new process. Like FORKED and VFORKED, THREAD_CLONED
waitstatuses have a child_ptid property, and the child is held stopped
until GDB explicitly resumes it. This addresses the in-line stepping
case (issues #1 and #2).
The infrun code that handles displaced stepping fixup for the child
after a fork/vfork event is thus reused for THREAD_CLONE, with some
minimal conditions added, addressing the displaced stepping case
(issue #3).
The native Linux backend is adjusted to unconditionally report
TARGET_WAITKIND_THREAD_CLONED events to the core.
Following the follow_fork model in core GDB, we introduce a
target_follow_clone target method, which is responsible for making the
new clone child visible to the rest of GDB.
Subsequent patches will add clone events support to the remote
protocol and gdbserver.
displaced_step_in_progress_thread becomes unused with this patch, but
a new use will reappear later in the series. To avoid deleting it and
readding it back, this patch marks it with attribute unused, and the
latter patch removes the attribute again. We need to do this because
the function is static, and with no callers, the compiler would warn,
(error with -Werror), breaking the build.
This adds a new gdb.threads/stepi-over-clone.exp testcase, which
exercises stepping over a clone syscall, with displaced stepping vs
inline stepping, and all-stop vs non-stop. We already test stepping
over clone syscalls with gdb.base/step-over-syscall.exp, but this test
uses pthreads, while the other test uses raw clone, and this one is
more thorough. The testcase passes on native GNU/Linux, but fails
against GDBserver. GDBserver will be fixed by a later patch in the
series.
Co-authored-by: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
|
|
I noticed that on an Ubuntu 20.04 system, after a following patch
("Step over clone syscall w/ breakpoint,
TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
was passing cleanly, but still, we'd end up with four new unexpected
GDB core dumps:
=== gdb Summary ===
# of unexpected core files 4
# of expected passes 48
That said patch is making the pre-existing
gdb.threads/step-over-exec.exp testcase (almost silently) expose a
latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
#1 - a non-leader thread execs
#2 - the post-exec program stops somewhere
#3 - you kill the inferior
Instead of #3 directly, the testcase just returns, which ends up in
gdb_exit, tearing down GDB, which kills the inferior, and is thus
equivalent to #3 above.
Vis (after said patch is applied):
$ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
...
(top-gdb) r
...
(gdb) b main
...
(gdb) r
...
Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
69 argv0 = argv[0];
(gdb) c
Continuing.
[New Thread 0x7ffff7d89700 (LWP 2506975)]
Other going in exec.
Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
28 foo ();
(gdb) k
...
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
393 return m_suspend.waitstatus_pending_p;
(top-gdb) bt
#0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
#1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
#2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
#3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
#4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
#5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
#6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
#7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
#8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
...
The root of the problem is that when a non-leader LWP execs, it just
changes its tid to the tgid, replacing the pre-exec leader thread,
becoming the new leader. There's no thread exit event for the execing
thread. It's as if the old pre-exec LWP vanishes without trace. The
ptrace man page says:
"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
Stop the tracee at the next execve(2). A waitpid(2) by the
tracer will return a status value such that
status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
If the execing thread is not a thread group leader, the thread
ID is reset to thread group leader's ID before this stop.
Since Linux 3.0, the former thread ID can be retrieved with
PTRACE_GETEVENTMSG."
When the core of GDB processes an exec events, it deletes all the
threads of the inferior. But, that is too late -- deleting the thread
does not delete the corresponding LWP, so we end leaving the pre-exec
non-leader LWP stale in the LWP list. That's what leads to the crash
above -- linux_nat_target::kill iterates over all LWPs, and after the
patch in question, that code will look for the corresponding
thread_info for each LWP. For the pre-exec non-leader LWP still
listed, won't find one.
This patch fixes it, by deleting the pre-exec non-leader LWP (and
thread) from the LWP/thread lists as soon as we get an exec event out
of ptrace.
GDBserver does not need an equivalent fix, because it is already doing
this, as side effect of mourning the pre-exec process, in
gdbserver/linux-low.cc:
else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
{
...
/* Delete the execing process and all its threads. */
mourn (proc);
switch_to_thread (nullptr);
The crash with gdb.threads/step-over-exec.exp is not observable on
newer systems, which postdate the glibc change to move "libpthread.so"
internals to "libc.so.6", because right after the exec, GDB traps a
load event for "libc.so.6", which leads to GDB trying to open
libthread_db for the post-exec inferior, and, on such systems that
succeeds. When we load libthread_db, we call
linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
lwps, and then waits to see their stops. While doing this, GDB
detects that the pre-exec stale LWP is gone, and deletes it.
If we use "catch exec" to stop right at the exec before the
"libc.so.6" load event ever happens, and issue "kill" right there,
then GDB crashes on newer systems as well. So instead of tweaking
gdb.threads/step-over-exec.exp to cover the fix, add a new
gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
The test also uses the new "maint info linux-lwps" command if testing
on Linux native, which also exposes the stale LWP problem with an
unfixed GDB.
Also tweak a comment in infrun.c:follow_exec referring to how
linux-nat.c used to behave, as it would become stale otherwise.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
|
|
This adds a maintenance command that lets you list all the LWPs under
control of the linux-nat target.
For example:
(gdb) maint info linux-lwps
LWP Ptid Thread ID
560948.561047.0 None
560948.560948.0 1.1
This shows that "560948.561047.0" LWP doesn't map to any thread_info
object, which is bogus. We'll be using this in a testcase in a
following patch.
Co-Authored-By: Pedro Alves <pedro@palves.net>
Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5
|
|
Overview
========
Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped. The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.
This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.
This commit addresses this situation for native and remote targets.
There are a number of different fixes, but all are required in order
to get this functionality working correct for native and remote
targets.
Native Linux Target
===================
For the native Linux target, detaching is handled in the function
linux_nat_target::detach. In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.
GDB then detaches from everything except the main thread by calling
detach_callback.
After this the first problem is this assert:
/* Only the initial process should be left right now. */
gdb_assert (num_lwps (pid) == 1);
The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached. I fix this by changing
the assert to allow for 0 or 1 lwps at this point. As the 0 case can
only happen in non-stop mode, the assert becomes:
gdb_assert (num_lwps (pid) == 1
|| (target_is_non_stop_p () && num_lwps (pid) == 0));
The next problem is that we do:
main_lwp = find_lwp_pid (ptid_t (pid));
and then proceed assuming that main_lwp is not nullptr. In the case
that the main thread has exited though, main_lwp will be nullptr.
However, we only need main_lwp so that GDB can detach from the
thread. If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.
For Remote Targets
==================
On remote targets there are two problems.
First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach. The 'set debug remote on' trace
looks like this:
[remote] Sending packet: $z0,7f1648fe0241,1#35
[remote] Notification received: Stop:W0;process:2a0ac8
# At this point an unpatched gdbserver segfaults, and the connection
# is broken. A patched gdbserver continues as below...
[remote] Packet received: E01
[remote] Sending packet: $z0,7f1648ff00a8,1#68
[remote] Packet received: E01
[remote] Sending packet: $z0,7f1648ff132f,1#6b
[remote] Packet received: E01
[remote] Sending packet: $D;2a0ac8#3e
[remote] Packet received: E01
I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This
function calls current_process() and then dereferences the result to
find the breakpoint list.
However, in our case, the current process has already exited, and so
the current_process() call returns nullptr. At the point of failure,
the gdbserver backtrace looks like this:
#0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
#1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
#2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
#3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
...
The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent. Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.
The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails. This was being
reported to the user with a 'Can't detach process' error. As the test
actually calls detach from Python code, this error was then becoming a
Python exception.
Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.
We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach. In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.
Testing
=======
In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call. The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.
However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call. And so that is the
approach I've taken for testing this patch.
Tested-By: Kevin Buettner <kevinb@redhat.com>
Approved-By: Kevin Buettner <kevinb@redhat.com>
|
|
This function is just a wrapper around the current inferior's gdbarch.
I find that having that wrapper just obscures where the arch is coming
from, and that it's often used as "I don't know which arch to use so
I'll use this magical target_gdbarch function that gets me an arch" when
the arch should in fact come from something in the context (a thread,
objfile, symbol, etc). I think that removing it and inlining
`current_inferior ()->arch ()` everywhere will make it a bit clearer
where that arch comes from and will trigger people into reflecting
whether this is the right place to get the arch or not.
Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b
Reviewed-By: John Baldwin <jhb@FreeBSD.org>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
I noticed a comment by an include and remembered that I think these
don't really provide much value -- sometimes they are just editorial,
and sometimes they are obsolete. I think it's better to just remove
them. Tested by rebuilding.
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
Currently, each target backend is responsible for printing "[Thread
...exited]" before deleting a thread. This leads to unnecessary
differences between targets, like e.g. with the remote target, we
never print such messages, even though we do print "[New Thread ...]".
E.g., debugging the gdb.threads/attach-many-short-lived-threads.exp
with gdbserver, letting it run for a bit, and then pressing Ctrl-C, we
currently see:
(gdb) c
Continuing.
^C[New Thread 3850398.3887449]
[New Thread 3850398.3887500]
[New Thread 3850398.3887551]
[New Thread 3850398.3887602]
[New Thread 3850398.3887653]
...
Thread 1 "attach-many-sho" received signal SIGINT, Interrupt.
0x00007ffff7e6a23f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffda80, rem=rem@entry=0x7fffffffda80)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
78 in ../sysdeps/unix/sysv/linux/clock_nanosleep.c
(gdb)
Above, we only see "New Thread" notifications, even though threads
were deleted.
After this patch, we'll see:
(gdb) c
Continuing.
^C[Thread 3558643.3577053 exited]
[Thread 3558643.3577104 exited]
[Thread 3558643.3577155 exited]
[Thread 3558643.3579603 exited]
...
[New Thread 3558643.3597415]
[New Thread 3558643.3600015]
[New Thread 3558643.3599965]
...
Thread 1 "attach-many-sho" received signal SIGINT, Interrupt.
0x00007ffff7e6a23f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffda80, rem=rem@entry=0x7fffffffda80)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
78 in ../sysdeps/unix/sysv/linux/clock_nanosleep.c
(gdb) q
This commit fixes this by moving the thread exit printing to common
code instead, triggered from within delete_thread (or rather,
set_thread_exited).
There's one wrinkle, though. While most targest want to print:
[Thread ... exited]
the Windows target wants to print:
[Thread ... exited with code <exit_code>]
... and sometimes wants to suppress the notification for the main
thread. To address that, this commits adds a delete_thread_with_code
function, only used by that target (so far).
This fix was originally posted as part of a larger series:
https://inbox.sourceware.org/gdb-patches/20221212203101.1034916-1-pedro@palves.net/
But didn't really need to be part of that series. In order to get
this fix merged sooner, I (Andrew Burgess) have rebased this commit
outside of the original series. Any bugs introduced while splitting
this patch out and rebasing, are entirely my own.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30129
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
|
|
This commit adjusts some of the debug output in linux-nat.c, but makes
no other functional changes to GDB.
In resume_lwp I've added the word "sibling" to one of the debug
messages. All the other debug messages in this function talk about
operating on the sibling thread, so I think it makes sense, for
consistency, if the message I've updated also talks about the sibling
thread.
In resume_stopped_resumed_lwps I've reordered the condition checks so
that the vfork-parent check now happens after the checks for whether
the thread is already resumed or not. This makes no functional
difference to GDB, but does, I think, mean we see more helpful debug
messages first.
Consider the situation where a vfork-parent thread is already resumed,
and resume_stopped_resumed_lwps is called. Previously the message
saying that the thread was not being resumed due to being a
vfork-parent, was printed. This might give the impression that the
thread is left in a not resumed state, which is misleading.
After this change we now get a message saying that the thread is not
being resumed due to it not being stopped (i.e. is already resumed).
With this message the already resumed nature of the thread is much
clearer.
I found this change helpful when debugging some vfork related issues.
|
|
While working on some of the recent patches relating to vfork handling
I found this debug output very helpful, I'd like to propose adding
this into GDB.
With debug turned off there should be no user visible changes after
this commit.
|
|
While working on a later patch, which changes gdb.base/foll-vfork.exp,
I noticed that sometimes I would hit this assert:
x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
I eventually tracked it down to a combination of schedule-multiple
mode being on, target-non-stop being off, follow-fork-mode being set
to child, and some bad timing. The failing case is pretty simple, a
single threaded application performs a vfork, the child process then
execs some other application while the parent process (once the vfork
child has completed its exec) just exits. As best I understand
things, here's what happens when things go wrong:
1. The parent process performs a vfork, GDB sees the VFORKED event
and creates an inferior and thread for the vfork child,
2. GDB resumes the vfork child process. As schedule-multiple is on
and target-non-stop is off, this is translated into a request to
start all processes (see user_visible_resume_ptid),
3. In the linux-nat layer we spot that one of the threads we are
about to start is a vfork parent, and so don't start that
thread (see resume_lwp), the vfork child thread is resumed,
4. GDB waits for the next event, eventually entering
linux_nat_target::wait, which in turn calls linux_nat_wait_1,
5. In linux_nat_wait_1 we eventually call
resume_stopped_resumed_lwps, this should restart threads that have
stopped but don't actually have anything interesting to report.
6. Unfortunately, resume_stopped_resumed_lwps doesn't check for
vfork parents like resume_lwp does, so at this point the vfork
parent is resumed. This feels like the start of the bug, and this
is where I'm proposing to fix things, but, resuming the vfork parent
isn't the worst thing in the world because....
7. As the vfork child is still alive the kernel holds the vfork
parent stopped,
8. Eventually the child performs its exec and GDB is sent and EXECD
event. However, because the parent is resumed, as soon as the child
performs its exec the vfork parent also sends a VFORK_DONE event to
GDB,
9. Depending on timing both of these events might seem to arrive in
GDB at the same time. Normally GDB expects to see the EXECD or
EXITED/SIGNALED event from the vfork child before getting the
VFORK_DONE in the parent. We know this because it is as a result of
the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see
handle_vfork_child_exec_or_exit for details). Further the comment
in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that
when we remain attached to the child (not the parent) we should not
expect to see a VFORK_DONE,
10. If both events arrive at the same time then GDB will randomly
choose one event to handle first, in some cases this will be the
VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB
expects that (a) the vfork child has finished, however, in this case
this is not completely true, the child has finished, but GDB has not
processed the event associated with the completion yet, and (b) upon
seeing a VFORK_DONE GDB assumes we are remaining attached to the
parent, and so resumes the parent process,
11. GDB now handles the EXECD event. In our case we are detaching
from the parent, so GDB calls target_detach (see
handle_vfork_child_exec_or_exit),
12. While this has been going on the vfork parent is executing, and
might even exit,
13. In linux_nat_target::detach the first thing we do is stop all
threads in the process we're detaching from, the result of the stop
request will be cached on the lwp_info object,
14. In our case the vfork parent has exited though, so when GDB
waits for the thread, instead of a stop due to signal, we instead
get a thread exited status,
15. Later in the detach process we try to resume the threads just
prior to making the ptrace call to actually detach (see
detach_one_lwp), as part of the process to resume a thread we try to
touch some registers within the thread, and before doing this GDB
asserts that the thread is stopped,
16. An exited thread is not classified as stopped, and so the assert
triggers!
So there's two bugs I see here. The first, and most critical one here
is in step #6. I think that resume_stopped_resumed_lwps should not
resume a vfork parent, just like resume_lwp doesn't resume a vfork
parent.
With this change in place the vfork parent will remain stopped in step
instead GDB will only see the EXECD/EXITED/SIGNALLED event. The
problems in #9 and #10 are therefore skipped and we arrive at #11,
handling the EXECD event. As the parent is still stopped #12 doesn't
apply, and in #13 when we try to stop the process we will see that it
is already stopped, there's no risk of the vfork parent exiting before
we get to this point. And finally, in #15 we are safe to poke the
process registers because it will not have exited by this point.
However, I did mention two bugs.
The second bug I've not yet managed to actually trigger, but I'm
convinced it must exist: if we forget vforks for a moment, in step #13
above, when linux_nat_target::detach is called, we first try to stop
all threads in the process GDB is detaching from. If we imagine a
multi-threaded inferior with many threads, and GDB running in non-stop
mode, then, if the user tries to detach there is a chance that thread
could exit just as linux_nat_target::detach is entered, in which case
we should be able to trigger the same assert.
But, like I said, I've not (yet) managed to trigger this second bug,
and even if I could, the fix would not belong in this commit, so I'm
pointing this out just for completeness.
There's no test included in this commit. In a couple of commits time
I will expand gdb.base/foll-vfork.exp which is when this bug would be
exposed. Unfortunately there are at least two other bugs (separate
from the ones discussed above) that need fixing first, these will be
fixed in the next commits before the gdb.base/foll-vfork.exp test is
expanded.
If you do want to reproduce this failure then you will for certainly
need to run the gdb.base/foll-vfork.exp test in a loop as the failures
are all very timing sensitive. I've found that running multiple
copies in parallel makes the failure more likely to appear, I usually
run ~6 copies in parallel and expect to see a failure after within
10mins.
|
|
Since commit 05c06f318fd9 ("Linux: Access memory even if threads are
running"), GDB prefers pread64/pwrite64 to access inferior memory
instead of ptrace. That change broke reading shared libraries on
SPARC64 Linux, as reported by PR gdb/30525 ("gdb cannot read shared
libraries on SPARC64").
On SPARC64 Linux, surprisingly (to me), userspace shared libraries are
mapped at high 64-bit addresses:
(gdb) info sharedlibrary
Cannot access memory at address 0xfff80001002011e0
Cannot access memory at address 0xfff80001002011d8
Cannot access memory at address 0xfff80001002011d8
From To Syms Read Shared Object Library
0xfff80001000010a0 0xfff8000100021f80 Yes (*) /lib64/ld-linux.so.2
(*): Shared library is missing debugging information.
Those addresses are 64-bit addresses with the high bits set. When
interpreted as signed, they're negative.
The Linux kernel rejects pread64/pwrite64 if the offset argument of
type off_t (a signed type) is negative, which happens if the memory
address we're accessing has its high bit set. See
linux/fs/read_write.c sys_pread64 and sys_pwrite64 in Linux.
Thankfully, lseek does not fail in that situation. So the fix is to
use the 'lseek + read|write' path if the offset would be negative.
Fix this in both native GDB and GDBserver.
Tested on a SPARC64 GNU/Linux and x86-64 GNU/Linux.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30525
Change-Id: I79c724f918037ea67b7396fadb521bc9d1b10dc5
|
|
Fix some more typos:
- distinquish -> distinguish
- actualy -> actually
- singe -> single
- frash -> frame
- chid -> child
- dissassembler -> disassembler
- uninitalized -> uninitialized
- precontidion -> precondition
- regsiters -> registers
- marge -> merge
- sate -> state
- garanteed -> guaranteed
- explictly -> explicitly
- prefices (nonstandard plural) -> prefixes
- bondary -> boundary
- formated -> formatted
- ithe -> the
- arrav -> array
- coresponding -> corresponding
- owend -> owned
- fials -> fails
- diasm -> disasm
- ture -> true
- tpye -> type
There's one code change, the name of macro SIG_CODE_BONDARY_FAULT changed to
SIG_CODE_BOUNDARY_FAULT.
Tested on x86_64-linux.
|
|
Make find_thread_ptid (the overload that takes a process_stratum_target)
a method of process_stratum_target.
Change-Id: Ib190a925a83c6b93e9c585dc7c6ab65efbdd8629
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
I noticed that some debug log output printing an lwp's pending status
wasn't considering lp->waitstatus. This fixes it, by introducing a
new pending_status_str function.
Also fix the comment in gdb/linux-nat.h describing
lwp_info::waitstatus and details the description of lwp_info::status
while at it.
Change-Id: I66e5c7a363d30a925b093b195d72925ce5b6b980
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
Replace spaces with tabs in a bunch of places.
Change-Id: If0f87180f1d13028dc178e5a8af7882a067868b0
|
|
I've long wanted to remove 'struct buffer', and thanks to Simon's
earlier patch, I was finally able to do so. My feeling has been that
gdb already has several decent structures available for growing
strings: std::string of course, but also obstack and even objalloc
from BFD and dyn-string from libiberty. The previous patches in this
series removed all the uses of struct buffer, so this one can remove
the code and the remaining #includes.
|
|
While investigating something else, I noticed some weird code in
agent_run_command (use of memcpy rather than strcpy). Then I noticed
that 'cmd' is used as both an in and out parameter, despite being
const.
Casting away const like this is bad. This patch removes the const and
fixes the memcpy. I also added a static assert to assure myself that
the code in gdbserver is correct -- gdbserver is passing its own
buffer directly to agent_run_command.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
|
|
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
|
|
As of 1bcb0708f229 ("gdb/linux-nat: Check whether /proc/pid/mem is
writable"), GDB checks if /proc/pid/mem is writable. This is done
early at GDB startup, in order to get a consistent warning, instead of
a warning that depends on whenever GDB writes to inferior memory.
PR gdb/29907 points out that some build systems (like QEMU's,
apparently) may call 'gdb --version' to check GDB's presence & its
version on the system, and that Gentoo's build process has sandboxing
which blocks the /proc/pid/mem access and thus GDB warns, which
results in build fails.
To help with that, this patch delays the /proc/pid/mem check until we
start or attach to an inferior. Ends up potentially emiting a warning
close where we already emit other ptrace- and /proc- related warnings,
which just Feels Right.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29907
Change-Id: I5537653ecfbbe76a04ab035e40e59d09b4980763
|
|
Make the inferior_ptid bubble up to linux_nat_target::xfer_partial.
Change-Id: I62dbc5734c26648bb465f449c2003c73751cd812
|
|
I noticed we could reduce duplication a bit here.
Change-Id: If24e54d1dac71b46f7c1f68a18a073d4c084b644
|
|
Not a big deal, but it seems strange to check errno instead of the
ptrace return value to know whether it succeeded.
Change-Id: If0a6d0280ab0e5ecb077e546af0d6fe489c5b9fd
|
|
No caller cares about the value of *SIGINFO on failure. It's also
documented in the function doc that *SIGINFO is uninitialized (I
understand "untouched") on failure.
Change-Id: I5ef38a5f58e3635e109b919ddf6f827f38f1225a
|
|
Change return type to bool.
Change-Id: I1bf0360bfdd1b5994cd0f96c268d806f96fe51a4
|
|
No behavior change expected.
Change-Id: Ifaa64ecd619483646b024fd7c62e571e92a8eedb
|
|
Add a pid parameter to linux_proc_xfer_memory_partial, making the
inferior_ptid reference bubble up close to the target_ops::xfer_partial
boundary. No behavior change expected.
Change-Id: I58171b00ee1bba1ea22efdbb5dcab8b1ab3aac4c
|
|
linux_handle_extended_wait calls target_post_attach if we're handling
a PTRACE_EVENT_CLONE, and libthread_db.so isn't active.
target_post_attach just calls linux_init_ptrace_procfs to set the
lwp's ptrace options. However, this is completely unnecessary,
because, as man ptrace [1] says, options are inherited:
"Flags are inherited by new tracees created and "auto-attached" via
active PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, or PTRACE_O_TRACECLONE
options."
This removes the unnecessary call.
[1] - https://man7.org/linux/man-pages/man2/ptrace.2.html
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I533eaa60b700f7e40760311fc0d344d0b3f19a78
|
|
Currently, every internal_error call must be passed __FILE__/__LINE__
explicitly, like:
internal_error (__FILE__, __LINE__, "foo %d", var);
The need to pass in explicit __FILE__/__LINE__ is there probably
because the function predates widespread and portable variadic macros
availability. We can use variadic macros nowadays, and in fact, we
already use them in several places, including the related
gdb_assert_not_reached.
So this patch renames the internal_error function to something else,
and then reimplements internal_error as a variadic macro that expands
__FILE__/__LINE__ itself.
The result is that we now should call internal_error like so:
internal_error ("foo %d", var);
Likewise for internal_warning.
The patch adjusts all calls sites. 99% of the adjustments were done
with a perl/sed script.
The non-mechanical changes are in gdbsupport/errors.h,
gdbsupport/gdb_assert.h, and gdb/gdbarch.py.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
|
|
Converting from free-form macros to an enum gives a bit of type-safety.
This caught places where we would assign host error numbers to what
should contain a target fileio error number, for instance in
target_fileio_pread.
I added the FILEIO_SUCCESS enumerator, because
remote.c:remote_hostio_parse_result initializes the remote_errno output
variable to 0. It seems better to have an explicit enumerator than to
assign a value for which there is no enumerator. I considered
initializing this variable to FILEIO_EUNKNOWN instead, such that if the
remote side replies with an error and omits the errno value, we'll get
an errno that represents an error instead of 0 (which reprensents no
error). But it's not clear what the consequences of that change would
be, so I prefer to err on the side of caution and just keep the existing
behavior (there is no intended change in behavior with this patch).
Note that remote_hostio_parse_resul still reads blindly what the remote
side sends as a target errno into this variable, so we can still end up
with a nonsensical value here. It's not good, but out of the scope of
this patch.
Convert host_to_fileio_error and fileio_errno_to_host to return / accept
a fileio_error instead of an int, and cascade the change in the whole
chain that uses that.
Change-Id: I454b0e3fcf0732447bc872252fa8e57d138b0e03
|
|
Commit 05c06f318fd9a112529dfc313e6512b399a645e4 enabled GDB to access
memory while threads are running. It did this by accessing
/proc/PID/task/LWP/mem.
Unfortunately, this interface is not implemented for writing in older
kernels (such as RHEL6). This means that GDB is unable to insert
breakpoints on these hosts:
$ ./gdb -q gdb -ex start
Reading symbols from gdb...
Temporary breakpoint 1 at 0x40fdd5: file ../../src/gdb/gdb.c, line 28.
Starting program: /home/rhel6/fsf/linux/gdb/gdb
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x40fdd5
(gdb)
Before this patch, linux_proc_xfer_memory_partial (previously called
linux_proc_xfer_partial) would return TARGET_XFER_EOF if the write to
/proc/PID/mem failed. [More specifically, linux_proc_xfer_partial
would not "bother for one word," but the effect is the essentially
same.]
This status was checked by linux_nat_target::xfer_partial, which would
then fallback to using ptrace to perform the operation.
This is the specific hunk that removed the fallback:
- xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf,
- offset, len, xfered_len);
- if (xfer != TARGET_XFER_EOF)
- return xfer;
+ return linux_proc_xfer_memory_partial (readbuf, writebuf,
+ offset, len, xfered_len);
+ }
return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf,
offset, len, xfered_len);
This patch makes linux_nat_target::xfer_partial go straight to writing
memory via ptrace if writing via /proc/pid/mem is not possible in the
running kernel, enabling GDB to insert breakpoints on these older
kernels. Note that a recent patch changed the return status from
TARGET_XFER_EOF to TARGET_XFER_E_IO.
Tested on {unix,native-gdbserver,native-extended-gdbserver}/-m{32,64}
on x86_64, s390x, aarch64, and ppc64le.
Change-Id: If1d884278e8c4ea71d8836bedd56e6a6c242a415
|
|
Probe whether /proc/pid/mem is writable, by using it to write to a GDB
variable. This will be used in the following patch to avoid falling
back to writing to inferior memory with ptrace if /proc/pid/mem _is_
writable.
Change-Id: If87eff0b46cbe5e32a583e2977a9e17d29d0ed3e
|