Age | Commit message (Collapse) | Author | Files | Lines |
|
While working on some of the recent patches relating to vfork handling
I found this debug output very helpful, I'd like to propose adding
this into GDB.
With debug turned off there should be no user visible changes after
this commit.
|
|
While working on a later patch, which changes gdb.base/foll-vfork.exp,
I noticed that sometimes I would hit this assert:
x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
I eventually tracked it down to a combination of schedule-multiple
mode being on, target-non-stop being off, follow-fork-mode being set
to child, and some bad timing. The failing case is pretty simple, a
single threaded application performs a vfork, the child process then
execs some other application while the parent process (once the vfork
child has completed its exec) just exits. As best I understand
things, here's what happens when things go wrong:
1. The parent process performs a vfork, GDB sees the VFORKED event
and creates an inferior and thread for the vfork child,
2. GDB resumes the vfork child process. As schedule-multiple is on
and target-non-stop is off, this is translated into a request to
start all processes (see user_visible_resume_ptid),
3. In the linux-nat layer we spot that one of the threads we are
about to start is a vfork parent, and so don't start that
thread (see resume_lwp), the vfork child thread is resumed,
4. GDB waits for the next event, eventually entering
linux_nat_target::wait, which in turn calls linux_nat_wait_1,
5. In linux_nat_wait_1 we eventually call
resume_stopped_resumed_lwps, this should restart threads that have
stopped but don't actually have anything interesting to report.
6. Unfortunately, resume_stopped_resumed_lwps doesn't check for
vfork parents like resume_lwp does, so at this point the vfork
parent is resumed. This feels like the start of the bug, and this
is where I'm proposing to fix things, but, resuming the vfork parent
isn't the worst thing in the world because....
7. As the vfork child is still alive the kernel holds the vfork
parent stopped,
8. Eventually the child performs its exec and GDB is sent and EXECD
event. However, because the parent is resumed, as soon as the child
performs its exec the vfork parent also sends a VFORK_DONE event to
GDB,
9. Depending on timing both of these events might seem to arrive in
GDB at the same time. Normally GDB expects to see the EXECD or
EXITED/SIGNALED event from the vfork child before getting the
VFORK_DONE in the parent. We know this because it is as a result of
the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see
handle_vfork_child_exec_or_exit for details). Further the comment
in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that
when we remain attached to the child (not the parent) we should not
expect to see a VFORK_DONE,
10. If both events arrive at the same time then GDB will randomly
choose one event to handle first, in some cases this will be the
VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB
expects that (a) the vfork child has finished, however, in this case
this is not completely true, the child has finished, but GDB has not
processed the event associated with the completion yet, and (b) upon
seeing a VFORK_DONE GDB assumes we are remaining attached to the
parent, and so resumes the parent process,
11. GDB now handles the EXECD event. In our case we are detaching
from the parent, so GDB calls target_detach (see
handle_vfork_child_exec_or_exit),
12. While this has been going on the vfork parent is executing, and
might even exit,
13. In linux_nat_target::detach the first thing we do is stop all
threads in the process we're detaching from, the result of the stop
request will be cached on the lwp_info object,
14. In our case the vfork parent has exited though, so when GDB
waits for the thread, instead of a stop due to signal, we instead
get a thread exited status,
15. Later in the detach process we try to resume the threads just
prior to making the ptrace call to actually detach (see
detach_one_lwp), as part of the process to resume a thread we try to
touch some registers within the thread, and before doing this GDB
asserts that the thread is stopped,
16. An exited thread is not classified as stopped, and so the assert
triggers!
So there's two bugs I see here. The first, and most critical one here
is in step #6. I think that resume_stopped_resumed_lwps should not
resume a vfork parent, just like resume_lwp doesn't resume a vfork
parent.
With this change in place the vfork parent will remain stopped in step
instead GDB will only see the EXECD/EXITED/SIGNALLED event. The
problems in #9 and #10 are therefore skipped and we arrive at #11,
handling the EXECD event. As the parent is still stopped #12 doesn't
apply, and in #13 when we try to stop the process we will see that it
is already stopped, there's no risk of the vfork parent exiting before
we get to this point. And finally, in #15 we are safe to poke the
process registers because it will not have exited by this point.
However, I did mention two bugs.
The second bug I've not yet managed to actually trigger, but I'm
convinced it must exist: if we forget vforks for a moment, in step #13
above, when linux_nat_target::detach is called, we first try to stop
all threads in the process GDB is detaching from. If we imagine a
multi-threaded inferior with many threads, and GDB running in non-stop
mode, then, if the user tries to detach there is a chance that thread
could exit just as linux_nat_target::detach is entered, in which case
we should be able to trigger the same assert.
But, like I said, I've not (yet) managed to trigger this second bug,
and even if I could, the fix would not belong in this commit, so I'm
pointing this out just for completeness.
There's no test included in this commit. In a couple of commits time
I will expand gdb.base/foll-vfork.exp which is when this bug would be
exposed. Unfortunately there are at least two other bugs (separate
from the ones discussed above) that need fixing first, these will be
fixed in the next commits before the gdb.base/foll-vfork.exp test is
expanded.
If you do want to reproduce this failure then you will for certainly
need to run the gdb.base/foll-vfork.exp test in a loop as the failures
are all very timing sensitive. I've found that running multiple
copies in parallel makes the failure more likely to appear, I usually
run ~6 copies in parallel and expect to see a failure after within
10mins.
|
|
Since commit 05c06f318fd9 ("Linux: Access memory even if threads are
running"), GDB prefers pread64/pwrite64 to access inferior memory
instead of ptrace. That change broke reading shared libraries on
SPARC64 Linux, as reported by PR gdb/30525 ("gdb cannot read shared
libraries on SPARC64").
On SPARC64 Linux, surprisingly (to me), userspace shared libraries are
mapped at high 64-bit addresses:
(gdb) info sharedlibrary
Cannot access memory at address 0xfff80001002011e0
Cannot access memory at address 0xfff80001002011d8
Cannot access memory at address 0xfff80001002011d8
From To Syms Read Shared Object Library
0xfff80001000010a0 0xfff8000100021f80 Yes (*) /lib64/ld-linux.so.2
(*): Shared library is missing debugging information.
Those addresses are 64-bit addresses with the high bits set. When
interpreted as signed, they're negative.
The Linux kernel rejects pread64/pwrite64 if the offset argument of
type off_t (a signed type) is negative, which happens if the memory
address we're accessing has its high bit set. See
linux/fs/read_write.c sys_pread64 and sys_pwrite64 in Linux.
Thankfully, lseek does not fail in that situation. So the fix is to
use the 'lseek + read|write' path if the offset would be negative.
Fix this in both native GDB and GDBserver.
Tested on a SPARC64 GNU/Linux and x86-64 GNU/Linux.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30525
Change-Id: I79c724f918037ea67b7396fadb521bc9d1b10dc5
|
|
Fix some more typos:
- distinquish -> distinguish
- actualy -> actually
- singe -> single
- frash -> frame
- chid -> child
- dissassembler -> disassembler
- uninitalized -> uninitialized
- precontidion -> precondition
- regsiters -> registers
- marge -> merge
- sate -> state
- garanteed -> guaranteed
- explictly -> explicitly
- prefices (nonstandard plural) -> prefixes
- bondary -> boundary
- formated -> formatted
- ithe -> the
- arrav -> array
- coresponding -> corresponding
- owend -> owned
- fials -> fails
- diasm -> disasm
- ture -> true
- tpye -> type
There's one code change, the name of macro SIG_CODE_BONDARY_FAULT changed to
SIG_CODE_BOUNDARY_FAULT.
Tested on x86_64-linux.
|
|
Make find_thread_ptid (the overload that takes a process_stratum_target)
a method of process_stratum_target.
Change-Id: Ib190a925a83c6b93e9c585dc7c6ab65efbdd8629
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
I noticed that some debug log output printing an lwp's pending status
wasn't considering lp->waitstatus. This fixes it, by introducing a
new pending_status_str function.
Also fix the comment in gdb/linux-nat.h describing
lwp_info::waitstatus and details the description of lwp_info::status
while at it.
Change-Id: I66e5c7a363d30a925b093b195d72925ce5b6b980
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
Replace spaces with tabs in a bunch of places.
Change-Id: If0f87180f1d13028dc178e5a8af7882a067868b0
|
|
I've long wanted to remove 'struct buffer', and thanks to Simon's
earlier patch, I was finally able to do so. My feeling has been that
gdb already has several decent structures available for growing
strings: std::string of course, but also obstack and even objalloc
from BFD and dyn-string from libiberty. The previous patches in this
series removed all the uses of struct buffer, so this one can remove
the code and the remaining #includes.
|
|
While investigating something else, I noticed some weird code in
agent_run_command (use of memcpy rather than strcpy). Then I noticed
that 'cmd' is used as both an in and out parameter, despite being
const.
Casting away const like this is bad. This patch removes the const and
fixes the memcpy. I also added a static assert to assure myself that
the code in gdbserver is correct -- gdbserver is passing its own
buffer directly to agent_run_command.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
|
|
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
|
|
As of 1bcb0708f229 ("gdb/linux-nat: Check whether /proc/pid/mem is
writable"), GDB checks if /proc/pid/mem is writable. This is done
early at GDB startup, in order to get a consistent warning, instead of
a warning that depends on whenever GDB writes to inferior memory.
PR gdb/29907 points out that some build systems (like QEMU's,
apparently) may call 'gdb --version' to check GDB's presence & its
version on the system, and that Gentoo's build process has sandboxing
which blocks the /proc/pid/mem access and thus GDB warns, which
results in build fails.
To help with that, this patch delays the /proc/pid/mem check until we
start or attach to an inferior. Ends up potentially emiting a warning
close where we already emit other ptrace- and /proc- related warnings,
which just Feels Right.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29907
Change-Id: I5537653ecfbbe76a04ab035e40e59d09b4980763
|
|
Make the inferior_ptid bubble up to linux_nat_target::xfer_partial.
Change-Id: I62dbc5734c26648bb465f449c2003c73751cd812
|
|
I noticed we could reduce duplication a bit here.
Change-Id: If24e54d1dac71b46f7c1f68a18a073d4c084b644
|
|
Not a big deal, but it seems strange to check errno instead of the
ptrace return value to know whether it succeeded.
Change-Id: If0a6d0280ab0e5ecb077e546af0d6fe489c5b9fd
|
|
No caller cares about the value of *SIGINFO on failure. It's also
documented in the function doc that *SIGINFO is uninitialized (I
understand "untouched") on failure.
Change-Id: I5ef38a5f58e3635e109b919ddf6f827f38f1225a
|
|
Change return type to bool.
Change-Id: I1bf0360bfdd1b5994cd0f96c268d806f96fe51a4
|
|
No behavior change expected.
Change-Id: Ifaa64ecd619483646b024fd7c62e571e92a8eedb
|
|
Add a pid parameter to linux_proc_xfer_memory_partial, making the
inferior_ptid reference bubble up close to the target_ops::xfer_partial
boundary. No behavior change expected.
Change-Id: I58171b00ee1bba1ea22efdbb5dcab8b1ab3aac4c
|
|
linux_handle_extended_wait calls target_post_attach if we're handling
a PTRACE_EVENT_CLONE, and libthread_db.so isn't active.
target_post_attach just calls linux_init_ptrace_procfs to set the
lwp's ptrace options. However, this is completely unnecessary,
because, as man ptrace [1] says, options are inherited:
"Flags are inherited by new tracees created and "auto-attached" via
active PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, or PTRACE_O_TRACECLONE
options."
This removes the unnecessary call.
[1] - https://man7.org/linux/man-pages/man2/ptrace.2.html
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I533eaa60b700f7e40760311fc0d344d0b3f19a78
|
|
Currently, every internal_error call must be passed __FILE__/__LINE__
explicitly, like:
internal_error (__FILE__, __LINE__, "foo %d", var);
The need to pass in explicit __FILE__/__LINE__ is there probably
because the function predates widespread and portable variadic macros
availability. We can use variadic macros nowadays, and in fact, we
already use them in several places, including the related
gdb_assert_not_reached.
So this patch renames the internal_error function to something else,
and then reimplements internal_error as a variadic macro that expands
__FILE__/__LINE__ itself.
The result is that we now should call internal_error like so:
internal_error ("foo %d", var);
Likewise for internal_warning.
The patch adjusts all calls sites. 99% of the adjustments were done
with a perl/sed script.
The non-mechanical changes are in gdbsupport/errors.h,
gdbsupport/gdb_assert.h, and gdb/gdbarch.py.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
|
|
Converting from free-form macros to an enum gives a bit of type-safety.
This caught places where we would assign host error numbers to what
should contain a target fileio error number, for instance in
target_fileio_pread.
I added the FILEIO_SUCCESS enumerator, because
remote.c:remote_hostio_parse_result initializes the remote_errno output
variable to 0. It seems better to have an explicit enumerator than to
assign a value for which there is no enumerator. I considered
initializing this variable to FILEIO_EUNKNOWN instead, such that if the
remote side replies with an error and omits the errno value, we'll get
an errno that represents an error instead of 0 (which reprensents no
error). But it's not clear what the consequences of that change would
be, so I prefer to err on the side of caution and just keep the existing
behavior (there is no intended change in behavior with this patch).
Note that remote_hostio_parse_resul still reads blindly what the remote
side sends as a target errno into this variable, so we can still end up
with a nonsensical value here. It's not good, but out of the scope of
this patch.
Convert host_to_fileio_error and fileio_errno_to_host to return / accept
a fileio_error instead of an int, and cascade the change in the whole
chain that uses that.
Change-Id: I454b0e3fcf0732447bc872252fa8e57d138b0e03
|
|
Commit 05c06f318fd9a112529dfc313e6512b399a645e4 enabled GDB to access
memory while threads are running. It did this by accessing
/proc/PID/task/LWP/mem.
Unfortunately, this interface is not implemented for writing in older
kernels (such as RHEL6). This means that GDB is unable to insert
breakpoints on these hosts:
$ ./gdb -q gdb -ex start
Reading symbols from gdb...
Temporary breakpoint 1 at 0x40fdd5: file ../../src/gdb/gdb.c, line 28.
Starting program: /home/rhel6/fsf/linux/gdb/gdb
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x40fdd5
(gdb)
Before this patch, linux_proc_xfer_memory_partial (previously called
linux_proc_xfer_partial) would return TARGET_XFER_EOF if the write to
/proc/PID/mem failed. [More specifically, linux_proc_xfer_partial
would not "bother for one word," but the effect is the essentially
same.]
This status was checked by linux_nat_target::xfer_partial, which would
then fallback to using ptrace to perform the operation.
This is the specific hunk that removed the fallback:
- xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf,
- offset, len, xfered_len);
- if (xfer != TARGET_XFER_EOF)
- return xfer;
+ return linux_proc_xfer_memory_partial (readbuf, writebuf,
+ offset, len, xfered_len);
+ }
return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf,
offset, len, xfered_len);
This patch makes linux_nat_target::xfer_partial go straight to writing
memory via ptrace if writing via /proc/pid/mem is not possible in the
running kernel, enabling GDB to insert breakpoints on these older
kernels. Note that a recent patch changed the return status from
TARGET_XFER_EOF to TARGET_XFER_E_IO.
Tested on {unix,native-gdbserver,native-extended-gdbserver}/-m{32,64}
on x86_64, s390x, aarch64, and ppc64le.
Change-Id: If1d884278e8c4ea71d8836bedd56e6a6c242a415
|
|
Probe whether /proc/pid/mem is writable, by using it to write to a GDB
variable. This will be used in the following patch to avoid falling
back to writing to inferior memory with ptrace if /proc/pid/mem _is_
writable.
Change-Id: If87eff0b46cbe5e32a583e2977a9e17d29d0ed3e
|
|
This changes the parameter of target_ops::async from int to bool.
Regression tested on x86-64 Fedora 34.
|
|
For every stop, Linux GDB and GDBserver save the stopped thread's PC,
in lwp->stop_pc. This is done in save_stop_reason, in both
gdb/linux-nat.c and gdbserver/linux-low.cc. However, while we're
going through the shell after "run", in startup_inferior, we shouldn't
be reading registers, as we haven't yet determined the target's
architecture -- the shell's architecture may not even be the same as
the final inferior's.
In gdb/linux-nat.c, lwp->stop_pc is only needed when the thread has
stopped for a breakpoint, and since when going through the shell, no
breakpoint is going to hit, we could simply teach save_stop_reason to
only record the stop pc when the thread stopped for a breakpoint.
However, in gdbserver/linux-low.cc, lwp->stop_pc is used in more cases
than breakpoint hits (e.g., it's used in tracepoints & the
"while-stepping" feature).
So to avoid GDB vs GDBserver divergence, we apply the same approach to
both implementations.
We set a flag in the inferior (process in GDBserver) whenever it is
being nursed through the shell, and when that flag is set,
save_stop_reason bails out early. While going through the shell,
we'll only ever get process exits (normal or signalled), random
signals, and exec events, so nothing is lost.
Change-Id: If0f01831514d3a74d17efd102875de7d2c6401ad
|
|
When accessing /proc/PID/mem, if pread64/pwrite64/read/write encounters
an error and return -1, linux_proc_xfer_memory_partial return
TARGET_XFER_EOF.
I think it should return TARGET_XFER_E_IO in this case. TARGET_XFER_EOF
is returned when pread64/pwrite64/read/frite returns 0, which indicates
that the address space is gone and the whole process has exited or
execed.
This patch makes this change.
Regression tested on x86_64-linux-gnu.
Change-Id: I6030412459663b8d7933483fdda22a6c2c5d7221
|
|
This changes target_pid_to_exec_file and target_ops::pid_to_exec_file
to return a "const char *". I couldn't build many of these targets,
but did examine the code by hand -- also, as this only affects the
return type, it's normally pretty safe. This brings gdb and gdbserver
a bit closer, and allows for the removal of a const_cast as well.
|
|
The current target_resume interface is a bit odd & non-intuitive.
I've found myself explaining it a couple times the recent past, while
reviewing patches that assumed STEP/SIGNAL always applied to the
passed in PTID. It goes like this today:
- if the passed in PTID is a thread, then the step/signal request is
for that thread.
- otherwise, if PTID is a wildcard (all threads or all threads of
process), the step/signal request is for inferior_ptid, and PTID
indicates which set of threads run free.
Because GDB always switches the current thread to "leader" thread
being resumed/stepped/signalled, we can simplify this a bit to:
- step/signal are always for inferior_ptid.
- PTID indicates the set of threads that run free.
Still not ideal, but it's a minimal change and at least there are no
special cases this way.
That's what this patch does. It renames the PTID parameter to
SCOPE_PTID, adds some assertions to target_resume, and tweaks
target_resume's description. In addition, it also renames PTID to
SCOPE_PTID in the remote and linux-nat targets, and simplifies their
implementation a little bit. Other targets could do the same, but
they don't have to.
Change-Id: I02a2ec2ab3a3e9b191de1e9a84f55c17cab7daaf
|
|
linux_handle_extended_wait
The check removed by this patch, using current_inferior, looks wrong.
When debugging multiple inferiors with the Linux native target and
linux_handle_extended_wait is called, there's no guarantee about which
is the current inferior. The vfork-done event we receive could be for
any inferior. If the vfork-done event is for a non-current inferior, we
end up wrongfully ignoring it. As a result, the core never processes a
TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed
is never cleared, and breakpoints are never reinserted. However,
because the Linux native target decided to ignore the event, it resumed
the thread - while breakpoints out. And that's bad.
The proposed fix is to remove this check. Always report vfork-done
events and let infrun's logic decide if it should be ignored. We don't
save much cycles by filtering the event here.
Add a test that replicates the situation described above. See comments
in the test for more details.
Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
|
|
Now that filtered and unfiltered output can be treated identically, we
can unify the printf family of functions. This is done under the name
"gdb_printf". Most of this patch was written by script.
|
|
A number of spots call printf_unfiltered only because they are in code
that should not be interrupted by the pager. However, I believe these
cases are all handled by infrun's blanket ban on paging, and so can be
converted to the default (_filtered) API.
After this patch, I think all the remaining _unfiltered calls are ones
that really ought to be. A few -- namely in complete_command -- could
be replaced by a scoped assignment to pagination_enabled, but for the
remainder, the code seems simple enough like this.
|
|
The current zombie leader detection code in linux-nat.c has a race --
if a multi-threaded inferior exits just before check_zombie_leaders
finds that the leader is now zombie via checking /proc/PID/status,
check_zombie_leaders deletes the leader, assuming we won't get an
event for that exit (which we won't in some scenarios, but not in this
one). That might seem mostly harmless, but it has some downsides:
- later when we continue pulling events out of the kernel, we will
collect the exit event of the non-leader threads, and once we see
the last lwp in our list exit, we return _that_ lwp's exit code as
whole-process exit code to infrun, instead of the leader's exit
code.
- this can cause a hang in stop_all_threads in infrun.c. Say there
are 2 threads in the process. stop_all_threads stops each of those
threads, and then waits for two stop or exit events, one for each
thread. If the whole process exits, and check_zombie_leaders hits
the false-positive case, linux-nat.c will only return one event to
GDB (the whole-process exit returned when we see the last thread,
the non-leader thread, exit), making stop_all_threads hang forever
waiting for a second event that will never come.
However, in this false-positive scenario, where the whole process is
exiting, as opposed to just the leader (with pthread_exit(), for
example), we _will_ get an exit event shortly for the leader, after we
collect the exit event of all the other non-leader threads. Or put
another way, we _always_ get an event for the leader after we see it
become zombie.
I tried a number of approaches to fix this:
#1 - My first thought to address the race was to make GDB always
report the whole-process exit status for the leader thread, not for
whatever is the last lwp in the list. We _always_ get a final exit
(or exec) event for the leader, and when the race triggers, we're not
collecting it.
#2 - My second thought was to try to plug the race in the first place.
I thought of making GDB call waitpid/WNOHANG for all non-leader
threads immediately when the zombie leader is detected, assuming there
would be an exit event pending for each of them waiting to be
collected. Turns out that that doesn't work -- you can see the leader
become zombie _before_ the kernel kills all other threads. Waitpid in
that small time window returns 0, indicating no-event. Thankfully we
hit that race window all the time, which avoided trading one race for
another. Looking at the non-leader thread's status in /proc doesn't
help either, the threads are still in running state for a bit, for the
same reason.
#3 - My next attempt, which seemed promising, was to synchronously
stop and wait for the stop for each of the non-leader threads. For
the scenario in question, this will collect all the exit statuses of
the non-leader threads. Then, if we are left with only the zombie
leader in the lwp list, it means we either have a normal while-process
exit or an exec, in which case we should not delete the leader. If
_only_ the leader exited, like in gdb.threads/leader-exit.exp, then
after pausing threads, we will still have at least one live non-leader
thread in the list, and so we delete the leader lwp. I got this
working and polished, and it was only after staring at the kernel code
to convince myself that this would really work (and it would, for the
scenario I considered), that I realized I had failed to account for
one scenario -- if any non-leader thread is _already_ stopped when
some thread triggers a group exit, like e.g., if you have some threads
stopped and then resume just one thread with scheduler-locking or
non-stop, and that thread exits the process. I also played with
PTRACE_EVENT_EXIT, see if it would help in any way to plug the race,
and I couldn't find a way that it would result in any practical
difference compared to looking at /proc/PID/status, with respect to
having a race.
So I concluded that there's no way to plug the race, we just have to
deal with it. Which means, going back to approach #1. That is the
approach taken by this patch.
Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979
|
|
Reorganize linux-nat.c:linux_nat_filter_event such that all the
handling for events for LWPs not in the LWP list is together. This
helps make a following patch clearer. The comments and debug messages
have also been tweaked - the end goal is to have them synchronized
with the gdbserver counterpart.
Change-Id: I8586d8dcd76d8bd3795145e3056fc660e3b2cd22
|
|
Subclasses of inf_ptrace_target have to opt-in to using the event_pipe
by implementing the can_async_p and async methods. For subclasses
which do this, inf_ptrace_target provides is_async_p, async_wait_fd
and closes the pipe in the close target method.
inf_ptrace_target also provides wrapper routines around the event pipe
(async_file_open, async_file_close, async_file_flush, and
async_file_mark) for use in target methods such as async.
inf_ptrace_target also exports a static async_file_mark_if_open
function which can be used in SIGCHLD signal handlers.
|
|
If the attach target supports async mode, enable it after the
attach target's ::attach method returns.
|
|
Now that target_resume always enables async mode after target::resume
returns, these calls are redundant.
The other place that target resume methods are invoked outside of
target_resume are as the beneath target in record_full_wait_1. In
this case, async mode should already be enabled when supported by the
target before the resume method is invoked due to the following:
In general, targets which support async mode run as async until
::wait returns TARGET_WAITKIND_NO_RESUMED to indicate that there are
no unwaited for children (either they have exited or are stopped).
When that occurs, the loop in wait_one disables async mode. Later
if a stopped child is resumed, async mode is re-enabled in
do_target_resume before waiting for the next event.
In the case of record_full_wait_1, this function is invoked from the
::wait target method when fetching an event. If the underlying
target supports async mode, then an earlier call to do_target_resume
to resume the child reporting an event in the loop in
record_full_wait_1 would have already enabled async mode before
::wait was invoked. In addition, nothing in the code executed in
the loop in record_full_wait_1 disables async mode. Async mode is
only disabled higher in the call stack in wait_one after ::wait
returns.
It is also true that async mode can be disabled by an
INF_EXEC_COMPLETE event passed to inferior_event_handle, but all of
the places that invoke that are in the gdb core which is "above" a
target ::wait method.
Note that there is an earlier call to enable async mode in
linux_nat_target::resume. That call also marks the async event pipe
to report an existing event after enabling async mode, so it needs to
stay.
|
|
Use event_pipe from gdbsupport in place of the existing file
descriptor array.
|
|
Change-Id: I80328fab7096221356864b5a4fb30858b48d2c10
|
|
clone, sysgood
I think it's safe to remove checking support for these ptrace features,
they have all been added in what is now ancient times (around the
beginning of Linux 2.6). This allows removing a bit of complexity in
linux-nat.c and nat/linux-ptrace.c.
It also allows saving one extra fork every time we start debugging on
Linux: linux_check_ptrace_features forks a child process to test if some
ptrace features are supported. That child process forks a grand-child,
to test whether ptrace reports an event for the fork by the child. This
is no longer needed, if we assume the kernel supports reporting forks.
PTRACE_O_TRACEVFORKDONE was introduced in Linux in this change, in 2003:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=45c1a159b85b3b30afd26a77b4be312226bba416
PTRACE_O_TRACESYSGOOD was supported at least as of this change, in 2002:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=acc7088569c8eef04eeed0eff51d23bb5bcff964
PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, PTRACE_O_TRACEEXEC and
PTRACE_O_TRACECLONE were introduced in this change, in 2002:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=a0691b116f6a4473f0fa264210ab9b95771a2b46
Change-Id: Iffb906549a89cc6b619427f976ec044706ab1e8d
|
|
Same idea as 0fab79556484 ("gdb: use ptid_t::to_string in infrun debug
messages"), but throughout GDB.
Change-Id: I62ba36eaef29935316d7187b9b13d7b88491acc1
|
|
Rename 'set debug lin-lwp' to 'set debug linux-nat' and 'show debug
lin-lwp' to 'show debug linux-nat'.
I've updated the documentation and help text to match, as well as
making it clear that the debug that is coming out relates to all
aspects of Linux native inferior support, not just the LWP aspect of
it.
The boundary between general "native" target debug, and the lwp
specific part of that debug was always a little blurry, but the actual
debug variable inside GDB is debug_linux_nat, and the print routine
linux_nat_debug_printf, is used throughout the linux-nat.c file, not
just for lwp related debug, so the new name seems to make more sense.
|
|
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
|
|
Convert the 'set debug lin-lwp' command to a boolean. Adds a new
LINUX_NAT_SCOPED_DEBUG_ENTER_EXIT macro, and makes use of it in one
place (linux_nat_target::stop).
The manual entry for 'set debug lin-lwp' is already vague about
exactly what arguments this command takes, and the description talks
about turning debug on and off, so I don't think there's any updates
required there.
I have updated the doc strings shown when the users enters 'help show
debug lin-lwp' or 'help show debug lin-lwp'. The old title lines used
to talk about the 'GNU/Linux lwp module', but this debug flag is now
used for any native linux target debug, so we now talk about
'GNU/Linux native target'. The body string for this setting has been
changed from 'Enables printf debugging output.' to 'When on, print
debug messages relating to the GNU/Linux native target.', the old
value looks like a cut&paste error to me.
|
|
While working on a later patch that required me to understand how GDB
starts up inferiors, I was confused by the
target_ops::post_startup_inferior method.
The post_startup_inferior target function is only called from
inf_ptrace_target::create_inferior.
Part of the target class hierarchy looks like this:
inf_child_target
|
'-- inf_ptrace_target
|
|-- linux_nat_target
|
|-- fbsd_nat_target
|
|-- nbsd_nat_target
|
|-- obsd_nat_target
|
'-- rs6000_nat_target
Every sub-class of inf_ptrace_target, except rs6000_nat_target,
implements ::post_startup_inferior. The rs6000_nat_target picks up
the implementation of ::post_startup_inferior not from
inf_ptrace_target, but from inf_child_target.
No descendent of inf_child_target, outside the inf_ptrace_target
sub-tree, implements ::post_startup_inferior, which isn't really
surprising, as they would never see the method called (remember, the
method is only called from inf_ptrace_target::create_inferior).
What I find confusing is the role inf_child_target plays in
implementing, what is really a helper function for just one of its
descendents.
In this commit I propose that we formally make ::post_startup_inferior
a helper function of inf_ptrace_target. To do this I will remove the
::post_startup_inferior from the target_ops API, and instead make this
a protected, pure virtual function on inf_ptrace_target.
I'll remove the empty implementation of ::post_startup_inferior from
the inf_child_target class, and add a new empty implementation to the
rs6000_nat_target class.
All the other descendents of inf_ptrace_target already provide an
implementation of this method and so don't need to change beyond
making the method protected within their class declarations.
To me, this makes much more sense now. The helper function, which is
only called from within the inf_ptrace_target class, is now a part of
the inf_ptrace_target class.
The only way in which this change is visible to a user is if the user
turns on 'set debug target 1'. With this debug flag on, prior to this
patch the user would see something like:
-> native->post_startup_inferior (...)
<- native->post_startup_inferior (2588939)
After this patch these lines are no longer present, as the
post_startup_inferior is no longer a top level target method. For me,
this is an acceptable change.
|
|
While working with pending fork events, I wondered what would happen if
the user detached an inferior while a thread of that inferior had a
pending fork event. What happens with the fork child, which is
ptrace-attached by the GDB process (or by GDBserver), but not known to
the core? Sure enough, neither the core of GDB or the target detach the
child process, so GDB (or GDBserver) just stays ptrace-attached to the
process. The result is that the fork child process is stuck, while you
would expect it to be detached and run.
Make GDBserver detach of fork children it knows about. That is done in
the generic handle_detach function. Since a process_info already exists
for the child, we can simply call detach_inferior on it.
GDB-side, make the linux-nat and remote targets detach of fork children
known because of pending fork events. These pending fork events can be
stored in:
- thread_info::pending_waitstatus, if the core has consumed the event
but then saved it for later (for example, because it got the event
while stopping all threads, to present an all-stop stop on top of a
non-stop target)
- thread_info::pending_follow: if we ran to a "catch fork" and we
detach at that moment
Additionally, pending fork events can be in target-specific fields:
- For linux-nat, they can be in lwp_info::status and
lwp_info::waitstatus.
- For the remote target, they could be stored as pending stop replies,
saved in `remote_state::notif_state::pending_event`, if not
acknowledged yet, or in `remote_state::stop_reply_queue`, if
acknowledged. I followed the model of remove_new_fork_children for
this: call remote_notif_get_pending_events to process /
acknowledge any unacknowledged notification, then look through
stop_reply_queue.
Update the gdb.threads/pending-fork-event.exp test (and rename it to
gdb.threads/pending-fork-event-detach.exp) to try to detach the process
while it is stopped with a pending fork event. In order to verify that
the fork child process is correctly detached and resumes execution
outside of GDB's control, make that process create a file in the test
output directory, and make the test wait $timeout seconds for that file
to appear (it happens instantly if everything goes well).
This test catches a bug in linux-nat.c, also reported as PR 28512
("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig()
const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind ==
TARGET_WAITKIND_SIGNALLED' failed.). When detaching a thread with a
pending event, get_detach_signal unconditionally fetches the signal
stored in the waitstatus (`tp->pending_waitstatus ().sig ()`). However,
that is only valid if the pending event is of type
TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit
would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean
the thread does not exist anymore, so we wouldn't be detaching it). Add
a condition in get_detach_signal to access the signal number only if the
wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0
instead (since the thread was not stopped with a signal to begin with).
Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to
verify that we consider events in pending stop replies in the remote
target. This test has many threads constantly forking, and we detach
from the program while the program is executing. That gives us some
chance that we detach while a fork stop reply is stored in the remote
target. To verify that we correctly detach all fork children, we ask
the parent to exit by sending it a SIGUSR1 signal and have it write a
file to the filesystem before exiting. Because the parent's main thread
joins the forking threads, and the forking threads wait for their fork
children to exit, if some fork child is not detach by GDB, the parent
will not write the file, and the test will time out. If I remove the
new remote_detach_pid calls in remote.c, the test fails eventually if I
run it in a loop.
There is a known limitation: we don't remove breakpoints from the
children before detaching it. So the children, could hit a trap
instruction after being detached and crash. I know this is wrong, and
it should be fixed, but I would like to handle that later. The current
patch doesn't fix everything, but it's a step in the right direction.
Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
|
|
The following patch will add some code paths that need to ptrace-detach
a given PID. Factor out the code that does this and put it in its own
function, so that it can be re-used.
Change-Id: Ie65ca0d89893b41aea0a23d9fc6ffbed042a9705
|
|
store_waitstatus is basically a translation function between a status
integer and an equivalent target_waitstatus object. It would make sense
for it to take the integer as a parameter and return the
target_waitstatus by value. Do that, and rename to
host_status_to_waitstatus. Users can then do:
ws = host_status_to_waitstatus (status)
which does the right thing, given the move constructor of
target_waitstatus.
Change-Id: I7a07d59d3dc19d3ed66929642f82f44f3e85d61b
|
|
This commit moves the target_async_permitted check out of each targets
::can_async_p method and into the target_can_async_p wrapper function.
I've left some asserts in the two ::can_async_p methods that I
changed, which will hopefully catch any direct calls to these methods
that might be added in the future.
There should be no user visible changes after this commit.
|
|
PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent
failure) shows that GDB can hit an unexpected scenario -- it can
happen that the kernel manages to open a /proc/PID/task/LWP/mem file,
but then reading from the file returns 0/EOF, even though the process
hasn't exited or execed.
"0" out of read/write is normally what you get when the address space
of the process the file was open for is gone, because the process
execed or exited. So when GDB gets the 0, it returns memory access
failure. In the bad case in question, the process hasn't execed or
exited, so GDB fails a memory access when the access should have
worked.
GDB has code in place to gracefully handle the case of opening the
/proc/PID/task/LWP/mem just while the LWP is exiting -- most often the
open fails with EACCES or ENOENT. When it happens, GDB just tries
opening the file for a different thread of the process. The testcase
is written such that it stresses GDB's logic of closing/reopening the
/proc/PID/task/LWP/mem file, by constantly spawning short lived
threads.
However, there's a window where the kernel manages to find the thread,
but the thread exits just after and clears its address space pointer.
In this case, the kernel creates a file successfully, but the file
ends up with no address space associated, so a subsequent read/write
returns 0/EOF too, just like if the whole process had execed or
exited. This is the case in question that GDB does not handle.
Oleg Nesterov gave this suggestion as workaround for that race:
gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm.
If statm reports something non-zero, then open() was "successfull".
I think that might work. However, I didn't try it, because I realized
we have another nasty race that that wouldn't fix.
The other race I realized is that because we close/reopen the
/proc/PID/task/LWP/mem file when GDB switches to a different inferior,
then it can happen that GDB reopens /proc/PID/task/LWP/mem just after
a thread execs, and before GDB has seen the corresponding exec event.
I.e., we can open a /proc/PID/task/LWP/mem file accessing the
post-exec address space thinking we're accessing the pre-exec address
space.
A few months back, Simon, Oleg and I discussed a similar race:
[Bug gdb/26754] Race condition when resuming threads and one does an exec
https://sourceware.org/bugzilla/show_bug.cgi?id=26754
The solution back then was to make the kernel fail any ptrace
operation until the exec event is consumed, with this kernel commit:
commit dbb5afad100a828c97e012c6106566d99f041db6
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed May 12 15:33:08 2021 +0200
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed May 12 10:45:22 2021 -0700
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
This however, only applies to ptrace, not to the /proc/pid/mem file
opening case. Also, even if it did apply to the file open case, we
would want to support current kernels until such a fix is more wide
spread anyhow.
So all in all, this commit gives up on the idea of only ever keeping
one /proc/pid/mem file descriptor open. Instead, make GDB open a
/proc/pid/mem per inferior, and keep it open until the inferior exits,
is detached or execs. Make GDB open the file right after the inferior
is created or is attached to or forks, at which point we know the
inferior is stable and stopped and isn't thus going to exec, or have a
thread exit, and so the file open won't fail (unless the whole process
is SIGKILLed from outside GDB, at which point it doesn't matter
whether we open the file).
This way, we avoid both races described above, at the expense of using
more file descriptors (one per inferior).
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419
|
|
I stumbled on a bug caused by the fact that a code path read
target_waitstatus::value::sig (expecting it to contain a gdb_signal
value) while target_waitstatus::kind was TARGET_WAITKIND_FORKED. This
meant that the active union field was in fact
target_waitstatus::value::related_pid, and contained a ptid. The read
signal value was therefore garbage, and that caused GDB to crash soon
after. Or, since that GDB was built with ubsan, this nice error
message:
/home/simark/src/binutils-gdb/gdb/linux-nat.c:1271:12: runtime error: load of value 2686365, which is not a valid value for type 'gdb_signal'
Despite being a large-ish change, I think it would be nice to make
target_waitstatus safe against that kind of bug. As already done
elsewhere (e.g. dynamic_prop), validate that the type of value read from
the union matches what is supposed to be the active field.
- Make the kind and value of target_waitstatus private.
- Make the kind initialized to TARGET_WAITKIND_IGNORE on
target_waitstatus construction. This is what most users appear to do
explicitly.
- Add setters, one for each kind. Each setter takes as a parameter the
data associated to that kind, if any. This makes it impossible to
forget to attach the associated data.
- Add getters, one for each associated data type. Each getter
validates that the data type fetched by the user matches the wait
status kind.
- Change "integer" to "exit_status", "related_pid" to "child_ptid",
just because that's more precise terminology.
- Fix all users.
That last point is semi-mechanical. There are a lot of obvious changes,
but some less obvious ones. For example, it's not possible to set the
kind at some point and the associated data later, as some users did.
But in any case, the intent of the code should not change in this patch.
This was tested on x86-64 Linux (unix, native-gdbserver and
native-extended-gdbserver boards). It was built-tested on x86-64
FreeBSD, NetBSD, MinGW and macOS. The rest of the changes to native
files was done as a best effort. If I forgot any place to update in
these files, it should be easy to fix (unless the change happens to
reveal an actual bug).
Change-Id: I0ae967df1ff6e28de78abbe3ac9b4b2ff4ad03b7
|