aboutsummaryrefslogtreecommitdiff
path: root/gdb/linux-nat.c
AgeCommit message (Collapse)AuthorFilesLines
2023-07-17gdb: additional debug output in infrun.c and linux-nat.cAndrew Burgess1-4/+19
While working on some of the recent patches relating to vfork handling I found this debug output very helpful, I'd like to propose adding this into GDB. With debug turned off there should be no user visible changes after this commit.
2023-07-17gdb: don't restart vfork parent while waiting for child to finishAndrew Burgess1-1/+8
While working on a later patch, which changes gdb.base/foll-vfork.exp, I noticed that sometimes I would hit this assert: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. I eventually tracked it down to a combination of schedule-multiple mode being on, target-non-stop being off, follow-fork-mode being set to child, and some bad timing. The failing case is pretty simple, a single threaded application performs a vfork, the child process then execs some other application while the parent process (once the vfork child has completed its exec) just exits. As best I understand things, here's what happens when things go wrong: 1. The parent process performs a vfork, GDB sees the VFORKED event and creates an inferior and thread for the vfork child, 2. GDB resumes the vfork child process. As schedule-multiple is on and target-non-stop is off, this is translated into a request to start all processes (see user_visible_resume_ptid), 3. In the linux-nat layer we spot that one of the threads we are about to start is a vfork parent, and so don't start that thread (see resume_lwp), the vfork child thread is resumed, 4. GDB waits for the next event, eventually entering linux_nat_target::wait, which in turn calls linux_nat_wait_1, 5. In linux_nat_wait_1 we eventually call resume_stopped_resumed_lwps, this should restart threads that have stopped but don't actually have anything interesting to report. 6. Unfortunately, resume_stopped_resumed_lwps doesn't check for vfork parents like resume_lwp does, so at this point the vfork parent is resumed. This feels like the start of the bug, and this is where I'm proposing to fix things, but, resuming the vfork parent isn't the worst thing in the world because.... 7. As the vfork child is still alive the kernel holds the vfork parent stopped, 8. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 9. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 10. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 11. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 12. While this has been going on the vfork parent is executing, and might even exit, 13. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 14. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 15. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 16. An exited thread is not classified as stopped, and so the assert triggers! So there's two bugs I see here. The first, and most critical one here is in step #6. I think that resume_stopped_resumed_lwps should not resume a vfork parent, just like resume_lwp doesn't resume a vfork parent. With this change in place the vfork parent will remain stopped in step instead GDB will only see the EXECD/EXITED/SIGNALLED event. The problems in #9 and #10 are therefore skipped and we arrive at #11, handling the EXECD event. As the parent is still stopped #12 doesn't apply, and in #13 when we try to stop the process we will see that it is already stopped, there's no risk of the vfork parent exiting before we get to this point. And finally, in #15 we are safe to poke the process registers because it will not have exited by this point. However, I did mention two bugs. The second bug I've not yet managed to actually trigger, but I'm convinced it must exist: if we forget vforks for a moment, in step #13 above, when linux_nat_target::detach is called, we first try to stop all threads in the process GDB is detaching from. If we imagine a multi-threaded inferior with many threads, and GDB running in non-stop mode, then, if the user tries to detach there is a chance that thread could exit just as linux_nat_target::detach is entered, in which case we should be able to trigger the same assert. But, like I said, I've not (yet) managed to trigger this second bug, and even if I could, the fix would not belong in this commit, so I'm pointing this out just for completeness. There's no test included in this commit. In a couple of commits time I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. Unfortunately there are at least two other bugs (separate from the ones discussed above) that need fixing first, these will be fixed in the next commits before the gdb.base/foll-vfork.exp test is expanded. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.
2023-07-06Linux: Avoid pread64/pwrite64 for high memory addresses (PR gdb/30525)Pedro Alves1-10/+18
Since commit 05c06f318fd9 ("Linux: Access memory even if threads are running"), GDB prefers pread64/pwrite64 to access inferior memory instead of ptrace. That change broke reading shared libraries on SPARC64 Linux, as reported by PR gdb/30525 ("gdb cannot read shared libraries on SPARC64"). On SPARC64 Linux, surprisingly (to me), userspace shared libraries are mapped at high 64-bit addresses: (gdb) info sharedlibrary Cannot access memory at address 0xfff80001002011e0 Cannot access memory at address 0xfff80001002011d8 Cannot access memory at address 0xfff80001002011d8 From To Syms Read Shared Object Library 0xfff80001000010a0 0xfff8000100021f80 Yes (*) /lib64/ld-linux.so.2 (*): Shared library is missing debugging information. Those addresses are 64-bit addresses with the high bits set. When interpreted as signed, they're negative. The Linux kernel rejects pread64/pwrite64 if the offset argument of type off_t (a signed type) is negative, which happens if the memory address we're accessing has its high bit set. See linux/fs/read_write.c sys_pread64 and sys_pwrite64 in Linux. Thankfully, lseek does not fail in that situation. So the fix is to use the 'lseek + read|write' path if the offset would be negative. Fix this in both native GDB and GDBserver. Tested on a SPARC64 GNU/Linux and x86-64 GNU/Linux. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30525 Change-Id: I79c724f918037ea67b7396fadb521bc9d1b10dc5
2023-06-05[gdb] Fix more typosTom de Vries1-1/+1
Fix some more typos: - distinquish -> distinguish - actualy -> actually - singe -> single - frash -> frame - chid -> child - dissassembler -> disassembler - uninitalized -> uninitialized - precontidion -> precondition - regsiters -> registers - marge -> merge - sate -> state - garanteed -> guaranteed - explictly -> explicitly - prefices (nonstandard plural) -> prefixes - bondary -> boundary - formated -> formatted - ithe -> the - arrav -> array - coresponding -> corresponding - owend -> owned - fials -> fails - diasm -> disasm - ture -> true - tpye -> type There's one code change, the name of macro SIG_CODE_BONDARY_FAULT changed to SIG_CODE_BOUNDARY_FAULT. Tested on x86_64-linux.
2023-04-04gdb: make find_thread_ptid a process_stratum_target methodSimon Marchi1-7/+7
Make find_thread_ptid (the overload that takes a process_stratum_target) a method of process_stratum_target. Change-Id: Ib190a925a83c6b93e9c585dc7c6ab65efbdd8629 Reviewed-By: Tom Tromey <tom@tromey.com>
2023-03-27linux-nat: introduce pending_status_strPedro Alves1-3/+16
I noticed that some debug log output printing an lwp's pending status wasn't considering lp->waitstatus. This fixes it, by introducing a new pending_status_str function. Also fix the comment in gdb/linux-nat.h describing lwp_info::waitstatus and details the description of lwp_info::status while at it. Change-Id: I66e5c7a363d30a925b093b195d72925ce5b6b980 Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-03-09gdb, gdbserver, gdbsupport: fix whitespace issuesSimon Marchi1-1/+1
Replace spaces with tabs in a bunch of places. Change-Id: If0f87180f1d13028dc178e5a8af7882a067868b0
2023-02-24Remove struct bufferTom Tromey1-1/+0
I've long wanted to remove 'struct buffer', and thanks to Simon's earlier patch, I was finally able to do so. My feeling has been that gdb already has several decent structures available for growing strings: std::string of course, but also obstack and even objalloc from BFD and dyn-string from libiberty. The previous patches in this series removed all the uses of struct buffer, so this one can remove the code and the remaining #includes.
2023-02-14Do not cast away const in agent_run_commandTom Tromey1-5/+2
While investigating something else, I noticed some weird code in agent_run_command (use of memcpy rather than strcpy). Then I noticed that 'cmd' is used as both an in and out parameter, despite being const. Casting away const like this is bad. This patch removes the const and fixes the memcpy. I also added a static assert to assure myself that the code in gdbserver is correct -- gdbserver is passing its own buffer directly to agent_run_command. Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-01-01Update copyright year range in header of all files managed by GDBJoel Brobecker1-1/+1
This commit is the result of running the gdb/copyright.py script, which automated the update of the copyright year range for all source files managed by the GDB project to be updated to include year 2023.
2022-12-16Delay checking whether /proc/pid/mem is writable (PR gdb/29907)Pedro Alves1-3/+6
As of 1bcb0708f229 ("gdb/linux-nat: Check whether /proc/pid/mem is writable"), GDB checks if /proc/pid/mem is writable. This is done early at GDB startup, in order to get a consistent warning, instead of a warning that depends on whenever GDB writes to inferior memory. PR gdb/29907 points out that some build systems (like QEMU's, apparently) may call 'gdb --version' to check GDB's presence & its version on the system, and that Gentoo's build process has sandboxing which blocks the /proc/pid/mem access and thus GDB warns, which results in build fails. To help with that, this patch delays the /proc/pid/mem check until we start or attach to an inferior. Ends up potentially emiting a warning close where we already emit other ptrace- and /proc- related warnings, which just Feels Right. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29907 Change-Id: I5537653ecfbbe76a04ab035e40e59d09b4980763
2022-12-05gdb/linux-nat: add ptid parameter to linux_xfer_siginfoSimon Marchi1-4/+4
Make the inferior_ptid bubble up to linux_nat_target::xfer_partial. Change-Id: I62dbc5734c26648bb465f449c2003c73751cd812
2022-12-05gdb/linux-nat: use l linux_nat_get_siginfo in linux_xfer_siginfoSimon Marchi1-4/+2
I noticed we could reduce duplication a bit here. Change-Id: If24e54d1dac71b46f7c1f68a18a073d4c084b644
2022-12-05gdb/linux-nat: check ptrace return value in linux_nat_get_siginfoSimon Marchi1-5/+1
Not a big deal, but it seems strange to check errno instead of the ptrace return value to know whether it succeeded. Change-Id: If0a6d0280ab0e5ecb077e546af0d6fe489c5b9fd
2022-12-05gdb/linux-nat: don't memset siginfo on failure in linux_nat_get_siginfoSimon Marchi1-6/+2
No caller cares about the value of *SIGINFO on failure. It's also documented in the function doc that *SIGINFO is uninitialized (I understand "untouched") on failure. Change-Id: I5ef38a5f58e3635e109b919ddf6f827f38f1225a
2022-12-05gdb/linux-nat: bool-ify linux_nat_get_siginfoSimon Marchi1-3/+3
Change return type to bool. Change-Id: I1bf0360bfdd1b5994cd0f96c268d806f96fe51a4
2022-12-05gdb/linux-nat: use get_ptrace_pid in two spotsSimon Marchi1-10/+2
No behavior change expected. Change-Id: Ifaa64ecd619483646b024fd7c62e571e92a8eedb
2022-12-02gdb/linux-nat: add pid parameter to linux_proc_xfer_memory_partialSimon Marchi1-9/+9
Add a pid parameter to linux_proc_xfer_memory_partial, making the inferior_ptid reference bubble up close to the target_ops::xfer_partial boundary. No behavior change expected. Change-Id: I58171b00ee1bba1ea22efdbb5dcab8b1ab3aac4c
2022-11-07Don't explicitly set clone child ptrace optionsPedro Alves1-1/+0
linux_handle_extended_wait calls target_post_attach if we're handling a PTRACE_EVENT_CLONE, and libthread_db.so isn't active. target_post_attach just calls linux_init_ptrace_procfs to set the lwp's ptrace options. However, this is completely unnecessary, because, as man ptrace [1] says, options are inherited: "Flags are inherited by new tracees created and "auto-attached" via active PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, or PTRACE_O_TRACECLONE options." This removes the unnecessary call. [1] - https://man7.org/linux/man-pages/man2/ptrace.2.html Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I533eaa60b700f7e40760311fc0d344d0b3f19a78
2022-10-19internal_error: remove need to pass __FILE__/__LINE__Pedro Alves1-9/+5
Currently, every internal_error call must be passed __FILE__/__LINE__ explicitly, like: internal_error (__FILE__, __LINE__, "foo %d", var); The need to pass in explicit __FILE__/__LINE__ is there probably because the function predates widespread and portable variadic macros availability. We can use variadic macros nowadays, and in fact, we already use them in several places, including the related gdb_assert_not_reached. So this patch renames the internal_error function to something else, and then reimplements internal_error as a variadic macro that expands __FILE__/__LINE__ itself. The result is that we now should call internal_error like so: internal_error ("foo %d", var); Likewise for internal_warning. The patch adjusts all calls sites. 99% of the adjustments were done with a perl/sed script. The non-mechanical changes are in gdbsupport/errors.h, gdbsupport/gdb_assert.h, and gdb/gdbarch.py. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
2022-09-21gdbsupport: convert FILEIO_* macros to an enumSimon Marchi1-3/+3
Converting from free-form macros to an enum gives a bit of type-safety. This caught places where we would assign host error numbers to what should contain a target fileio error number, for instance in target_fileio_pread. I added the FILEIO_SUCCESS enumerator, because remote.c:remote_hostio_parse_result initializes the remote_errno output variable to 0. It seems better to have an explicit enumerator than to assign a value for which there is no enumerator. I considered initializing this variable to FILEIO_EUNKNOWN instead, such that if the remote side replies with an error and omits the errno value, we'll get an errno that represents an error instead of 0 (which reprensents no error). But it's not clear what the consequences of that change would be, so I prefer to err on the side of caution and just keep the existing behavior (there is no intended change in behavior with this patch). Note that remote_hostio_parse_resul still reads blindly what the remote side sends as a target errno into this variable, so we can still end up with a nonsensical value here. It's not good, but out of the scope of this patch. Convert host_to_fileio_error and fileio_errno_to_host to return / accept a fileio_error instead of an int, and cascade the change in the whole chain that uses that. Change-Id: I454b0e3fcf0732447bc872252fa8e57d138b0e03
2022-07-26gdb/linux_nat: Write memory using ptrace if /proc/pid/mem is not writableKeith Seitz1-2/+9
Commit 05c06f318fd9a112529dfc313e6512b399a645e4 enabled GDB to access memory while threads are running. It did this by accessing /proc/PID/task/LWP/mem. Unfortunately, this interface is not implemented for writing in older kernels (such as RHEL6). This means that GDB is unable to insert breakpoints on these hosts: $ ./gdb -q gdb -ex start Reading symbols from gdb... Temporary breakpoint 1 at 0x40fdd5: file ../../src/gdb/gdb.c, line 28. Starting program: /home/rhel6/fsf/linux/gdb/gdb Warning: Cannot insert breakpoint 1. Cannot access memory at address 0x40fdd5 (gdb) Before this patch, linux_proc_xfer_memory_partial (previously called linux_proc_xfer_partial) would return TARGET_XFER_EOF if the write to /proc/PID/mem failed. [More specifically, linux_proc_xfer_partial would not "bother for one word," but the effect is the essentially same.] This status was checked by linux_nat_target::xfer_partial, which would then fallback to using ptrace to perform the operation. This is the specific hunk that removed the fallback: - xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf, - offset, len, xfered_len); - if (xfer != TARGET_XFER_EOF) - return xfer; + return linux_proc_xfer_memory_partial (readbuf, writebuf, + offset, len, xfered_len); + } return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf, offset, len, xfered_len); This patch makes linux_nat_target::xfer_partial go straight to writing memory via ptrace if writing via /proc/pid/mem is not possible in the running kernel, enabling GDB to insert breakpoints on these older kernels. Note that a recent patch changed the return status from TARGET_XFER_EOF to TARGET_XFER_E_IO. Tested on {unix,native-gdbserver,native-extended-gdbserver}/-m{32,64} on x86_64, s390x, aarch64, and ppc64le. Change-Id: If1d884278e8c4ea71d8836bedd56e6a6c242a415
2022-07-26gdb/linux-nat: Check whether /proc/pid/mem is writablePedro Alves1-17/+88
Probe whether /proc/pid/mem is writable, by using it to write to a GDB variable. This will be used in the following patch to avoid falling back to writing to inferior memory with ptrace if /proc/pid/mem _is_ writable. Change-Id: If87eff0b46cbe5e32a583e2977a9e17d29d0ed3e
2022-07-22Change target_ops::async to accept boolTom Tromey1-3/+3
This changes the parameter of target_ops::async from int to bool. Regression tested on x86-64 Fedora 34.
2022-06-28gdb+gdbserver/Linux: avoid reading registers while going through shellPedro Alves1-0/+4
For every stop, Linux GDB and GDBserver save the stopped thread's PC, in lwp->stop_pc. This is done in save_stop_reason, in both gdb/linux-nat.c and gdbserver/linux-low.cc. However, while we're going through the shell after "run", in startup_inferior, we shouldn't be reading registers, as we haven't yet determined the target's architecture -- the shell's architecture may not even be the same as the final inferior's. In gdb/linux-nat.c, lwp->stop_pc is only needed when the thread has stopped for a breakpoint, and since when going through the shell, no breakpoint is going to hit, we could simply teach save_stop_reason to only record the stop pc when the thread stopped for a breakpoint. However, in gdbserver/linux-low.cc, lwp->stop_pc is used in more cases than breakpoint hits (e.g., it's used in tracepoints & the "while-stepping" feature). So to avoid GDB vs GDBserver divergence, we apply the same approach to both implementations. We set a flag in the inferior (process in GDBserver) whenever it is being nursed through the shell, and when that flag is set, save_stop_reason bails out early. While going through the shell, we'll only ever get process exits (normal or signalled), random signals, and exec events, so nothing is lost. Change-Id: If0f01831514d3a74d17efd102875de7d2c6401ad
2022-05-26gdb/linux-nat: xfer_memory_partial return E_IO on errorLancelot SIX1-1/+1
When accessing /proc/PID/mem, if pread64/pwrite64/read/write encounters an error and return -1, linux_proc_xfer_memory_partial return TARGET_XFER_EOF. I think it should return TARGET_XFER_E_IO in this case. TARGET_XFER_EOF is returned when pread64/pwrite64/read/frite returns 0, which indicates that the address space is gone and the whole process has exited or execed. This patch makes this change. Regression tested on x86_64-linux-gnu. Change-Id: I6030412459663b8d7933483fdda22a6c2c5d7221
2022-05-13Constify target_pid_to_exec_fileTom Tromey1-1/+1
This changes target_pid_to_exec_file and target_ops::pid_to_exec_file to return a "const char *". I couldn't build many of these targets, but did examine the code by hand -- also, as this only affects the return type, it's normally pretty safe. This brings gdb and gdbserver a bit closer, and allows for the removal of a const_cast as well.
2022-04-29Slightly tweak and clarify target_resume's interfacePedro Alves1-19/+10
The current target_resume interface is a bit odd & non-intuitive. I've found myself explaining it a couple times the recent past, while reviewing patches that assumed STEP/SIGNAL always applied to the passed in PTID. It goes like this today: - if the passed in PTID is a thread, then the step/signal request is for that thread. - otherwise, if PTID is a wildcard (all threads or all threads of process), the step/signal request is for inferior_ptid, and PTID indicates which set of threads run free. Because GDB always switches the current thread to "leader" thread being resumed/stepped/signalled, we can simplify this a bit to: - step/signal are always for inferior_ptid. - PTID indicates the set of threads that run free. Still not ideal, but it's a minimal change and at least there are no special cases this way. That's what this patch does. It renames the PTID parameter to SCOPE_PTID, adds some assertions to target_resume, and tweaks target_resume's description. In addition, it also renames PTID to SCOPE_PTID in the remote and linux-nat targets, and simplifies their implementation a little bit. Other targets could do the same, but they don't have to. Change-Id: I02a2ec2ab3a3e9b191de1e9a84f55c17cab7daaf
2022-03-31gdb/linux-nat: remove check based on current_inferior in ↵Simon Marchi1-13/+4
linux_handle_extended_wait The check removed by this patch, using current_inferior, looks wrong. When debugging multiple inferiors with the Linux native target and linux_handle_extended_wait is called, there's no guarantee about which is the current inferior. The vfork-done event we receive could be for any inferior. If the vfork-done event is for a non-current inferior, we end up wrongfully ignoring it. As a result, the core never processes a TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed is never cleared, and breakpoints are never reinserted. However, because the Linux native target decided to ignore the event, it resumed the thread - while breakpoints out. And that's bad. The proposed fix is to remove this check. Always report vfork-done events and let infrun's logic decide if it should be ignored. We don't save much cycles by filtering the event here. Add a test that replicates the situation described above. See comments in the test for more details. Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
2022-03-29Unify gdb printf functionsTom Tromey1-4/+4
Now that filtered and unfiltered output can be treated identically, we can unify the printf family of functions. This is done under the name "gdb_printf". Most of this patch was written by script.
2022-03-29Remove some uses of printf_unfilteredTom Tromey1-2/+2
A number of spots call printf_unfiltered only because they are in code that should not be interrupted by the pager. However, I believe these cases are all handled by infrun's blanket ban on paging, and so can be converted to the default (_filtered) API. After this patch, I think all the remaining _unfiltered calls are ones that really ought to be. A few -- namely in complete_command -- could be replaced by a scoped assignment to pagination_enabled, but for the remainder, the code seems simple enough like this.
2022-03-10Re-add zombie leader on exit, gdb/linuxPedro Alves1-27/+80
The current zombie leader detection code in linux-nat.c has a race -- if a multi-threaded inferior exits just before check_zombie_leaders finds that the leader is now zombie via checking /proc/PID/status, check_zombie_leaders deletes the leader, assuming we won't get an event for that exit (which we won't in some scenarios, but not in this one). That might seem mostly harmless, but it has some downsides: - later when we continue pulling events out of the kernel, we will collect the exit event of the non-leader threads, and once we see the last lwp in our list exit, we return _that_ lwp's exit code as whole-process exit code to infrun, instead of the leader's exit code. - this can cause a hang in stop_all_threads in infrun.c. Say there are 2 threads in the process. stop_all_threads stops each of those threads, and then waits for two stop or exit events, one for each thread. If the whole process exits, and check_zombie_leaders hits the false-positive case, linux-nat.c will only return one event to GDB (the whole-process exit returned when we see the last thread, the non-leader thread, exit), making stop_all_threads hang forever waiting for a second event that will never come. However, in this false-positive scenario, where the whole process is exiting, as opposed to just the leader (with pthread_exit(), for example), we _will_ get an exit event shortly for the leader, after we collect the exit event of all the other non-leader threads. Or put another way, we _always_ get an event for the leader after we see it become zombie. I tried a number of approaches to fix this: #1 - My first thought to address the race was to make GDB always report the whole-process exit status for the leader thread, not for whatever is the last lwp in the list. We _always_ get a final exit (or exec) event for the leader, and when the race triggers, we're not collecting it. #2 - My second thought was to try to plug the race in the first place. I thought of making GDB call waitpid/WNOHANG for all non-leader threads immediately when the zombie leader is detected, assuming there would be an exit event pending for each of them waiting to be collected. Turns out that that doesn't work -- you can see the leader become zombie _before_ the kernel kills all other threads. Waitpid in that small time window returns 0, indicating no-event. Thankfully we hit that race window all the time, which avoided trading one race for another. Looking at the non-leader thread's status in /proc doesn't help either, the threads are still in running state for a bit, for the same reason. #3 - My next attempt, which seemed promising, was to synchronously stop and wait for the stop for each of the non-leader threads. For the scenario in question, this will collect all the exit statuses of the non-leader threads. Then, if we are left with only the zombie leader in the lwp list, it means we either have a normal while-process exit or an exec, in which case we should not delete the leader. If _only_ the leader exited, like in gdb.threads/leader-exit.exp, then after pausing threads, we will still have at least one live non-leader thread in the list, and so we delete the leader lwp. I got this working and polished, and it was only after staring at the kernel code to convince myself that this would really work (and it would, for the scenario I considered), that I realized I had failed to account for one scenario -- if any non-leader thread is _already_ stopped when some thread triggers a group exit, like e.g., if you have some threads stopped and then resume just one thread with scheduler-locking or non-stop, and that thread exits the process. I also played with PTRACE_EVENT_EXIT, see if it would help in any way to plug the race, and I couldn't find a way that it would result in any practical difference compared to looking at /proc/PID/status, with respect to having a race. So I concluded that there's no way to plug the race, we just have to deal with it. Which means, going back to approach #1. That is the approach taken by this patch. Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979
2022-03-10gdb: Reorganize linux_nat_filter_eventPedro Alves1-35/+40
Reorganize linux-nat.c:linux_nat_filter_event such that all the handling for events for LWPs not in the LWP list is together. This helps make a following patch clearer. The comments and debug messages have also been tweaked - the end goal is to have them synchronized with the gdbserver counterpart. Change-Id: I8586d8dcd76d8bd3795145e3056fc660e3b2cd22
2022-02-22inf-ptrace: Add an event_pipe to be used for async mode in subclasses.John Baldwin1-99/+24
Subclasses of inf_ptrace_target have to opt-in to using the event_pipe by implementing the can_async_p and async methods. For subclasses which do this, inf_ptrace_target provides is_async_p, async_wait_fd and closes the pipe in the close target method. inf_ptrace_target also provides wrapper routines around the event pipe (async_file_open, async_file_close, async_file_flush, and async_file_mark) for use in target methods such as async. inf_ptrace_target also exports a static async_file_mark_if_open function which can be used in SIGCHLD signal handlers.
2022-02-22Enable async mode in the target in attach_cmd.John Baldwin1-3/+0
If the attach target supports async mode, enable it after the attach target's ::attach method returns.
2022-02-22Don't enable async mode at the end of target ::resume methods.John Baldwin1-3/+0
Now that target_resume always enables async mode after target::resume returns, these calls are redundant. The other place that target resume methods are invoked outside of target_resume are as the beneath target in record_full_wait_1. In this case, async mode should already be enabled when supported by the target before the resume method is invoked due to the following: In general, targets which support async mode run as async until ::wait returns TARGET_WAITKIND_NO_RESUMED to indicate that there are no unwaited for children (either they have exited or are stopped). When that occurs, the loop in wait_one disables async mode. Later if a stopped child is resumed, async mode is re-enabled in do_target_resume before waiting for the next event. In the case of record_full_wait_1, this function is invoked from the ::wait target method when fetching an event. If the underlying target supports async mode, then an earlier call to do_target_resume to resume the child reporting an event in the loop in record_full_wait_1 would have already enabled async mode before ::wait was invoked. In addition, nothing in the code executed in the loop in record_full_wait_1 disables async mode. Async mode is only disabled higher in the call stack in wait_one after ::wait returns. It is also true that async mode can be disabled by an INF_EXEC_COMPLETE event passed to inferior_event_handle, but all of the places that invoke that are in the gdb core which is "above" a target ::wait method. Note that there is an earlier call to enable async mode in linux_nat_target::resume. That call also marks the async event pipe to report an existing event after enabling async mode, so it needs to stay.
2022-02-22gdb linux-nat: Convert linux_nat_event_pipe to the event_pipe class.John Baldwin1-43/+16
Use event_pipe from gdbsupport in place of the existing file descriptor array.
2022-02-18gdb: remove newlines from some linux_nat_debug_printf callsSimon Marchi1-3/+3
Change-Id: I80328fab7096221356864b5a4fb30858b48d2c10
2022-02-10gdb/linux: remove ptrace support check for exec, fork, vfork, vforkdone, ↵Simon Marchi1-69/+8
clone, sysgood I think it's safe to remove checking support for these ptrace features, they have all been added in what is now ancient times (around the beginning of Linux 2.6). This allows removing a bit of complexity in linux-nat.c and nat/linux-ptrace.c. It also allows saving one extra fork every time we start debugging on Linux: linux_check_ptrace_features forks a child process to test if some ptrace features are supported. That child process forks a grand-child, to test whether ptrace reports an event for the fork by the child. This is no longer needed, if we assume the kernel supports reporting forks. PTRACE_O_TRACEVFORKDONE was introduced in Linux in this change, in 2003: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=45c1a159b85b3b30afd26a77b4be312226bba416 PTRACE_O_TRACESYSGOOD was supported at least as of this change, in 2002: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=acc7088569c8eef04eeed0eff51d23bb5bcff964 PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, PTRACE_O_TRACEEXEC and PTRACE_O_TRACECLONE were introduced in this change, in 2002: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=a0691b116f6a4473f0fa264210ab9b95771a2b46 Change-Id: Iffb906549a89cc6b619427f976ec044706ab1e8d
2022-01-18gdb: use ptid_t::to_string instead of target_pid_to_str in debug statementsSimon Marchi1-46/+46
Same idea as 0fab79556484 ("gdb: use ptid_t::to_string in infrun debug messages"), but throughout GDB. Change-Id: I62ba36eaef29935316d7187b9b13d7b88491acc1
2022-01-12gdb: rename lin-lwp to linux-nat in set/show debugAndrew Burgess1-3/+3
Rename 'set debug lin-lwp' to 'set debug linux-nat' and 'show debug lin-lwp' to 'show debug linux-nat'. I've updated the documentation and help text to match, as well as making it clear that the debug that is coming out relates to all aspects of Linux native inferior support, not just the LWP aspect of it. The boundary between general "native" target debug, and the lwp specific part of that debug was always a little blurry, but the actual debug variable inside GDB is debug_linux_nat, and the print routine linux_nat_debug_printf, is used throughout the linux-nat.c file, not just for lwp related debug, so the new name seems to make more sense.
2022-01-01Automatic Copyright Year update after running gdb/copyright.pyJoel Brobecker1-1/+1
This commit brings all the changes made by running gdb/copyright.py as per GDB's Start of New Year Procedure. For the avoidance of doubt, all changes in this commits were performed by the script.
2021-12-22gdb: convert 'set debug lin-lwp' to a boolean commandAndrew Burgess1-9/+20
Convert the 'set debug lin-lwp' command to a boolean. Adds a new LINUX_NAT_SCOPED_DEBUG_ENTER_EXIT macro, and makes use of it in one place (linux_nat_target::stop). The manual entry for 'set debug lin-lwp' is already vague about exactly what arguments this command takes, and the description talks about turning debug on and off, so I don't think there's any updates required there. I have updated the doc strings shown when the users enters 'help show debug lin-lwp' or 'help show debug lin-lwp'. The old title lines used to talk about the 'GNU/Linux lwp module', but this debug flag is now used for any native linux target debug, so we now talk about 'GNU/Linux native target'. The body string for this setting has been changed from 'Enables printf debugging output.' to 'When on, print debug messages relating to the GNU/Linux native target.', the old value looks like a cut&paste error to me.
2021-12-13gdb: make post_startup_inferior a virtual method on inf_ptrace_targetAndrew Burgess1-0/+2
While working on a later patch that required me to understand how GDB starts up inferiors, I was confused by the target_ops::post_startup_inferior method. The post_startup_inferior target function is only called from inf_ptrace_target::create_inferior. Part of the target class hierarchy looks like this: inf_child_target | '-- inf_ptrace_target | |-- linux_nat_target | |-- fbsd_nat_target | |-- nbsd_nat_target | |-- obsd_nat_target | '-- rs6000_nat_target Every sub-class of inf_ptrace_target, except rs6000_nat_target, implements ::post_startup_inferior. The rs6000_nat_target picks up the implementation of ::post_startup_inferior not from inf_ptrace_target, but from inf_child_target. No descendent of inf_child_target, outside the inf_ptrace_target sub-tree, implements ::post_startup_inferior, which isn't really surprising, as they would never see the method called (remember, the method is only called from inf_ptrace_target::create_inferior). What I find confusing is the role inf_child_target plays in implementing, what is really a helper function for just one of its descendents. In this commit I propose that we formally make ::post_startup_inferior a helper function of inf_ptrace_target. To do this I will remove the ::post_startup_inferior from the target_ops API, and instead make this a protected, pure virtual function on inf_ptrace_target. I'll remove the empty implementation of ::post_startup_inferior from the inf_child_target class, and add a new empty implementation to the rs6000_nat_target class. All the other descendents of inf_ptrace_target already provide an implementation of this method and so don't need to change beyond making the method protected within their class declarations. To me, this makes much more sense now. The helper function, which is only called from within the inf_ptrace_target class, is now a part of the inf_ptrace_target class. The only way in which this change is visible to a user is if the user turns on 'set debug target 1'. With this debug flag on, prior to this patch the user would see something like: -> native->post_startup_inferior (...) <- native->post_startup_inferior (2588939) After this patch these lines are no longer present, as the post_startup_inferior is no longer a top level target method. For me, this is an acceptable change.
2021-12-08gdb, gdbserver: detach fork child when detaching from fork parentSimon Marchi1-1/+52
While working with pending fork events, I wondered what would happen if the user detached an inferior while a thread of that inferior had a pending fork event. What happens with the fork child, which is ptrace-attached by the GDB process (or by GDBserver), but not known to the core? Sure enough, neither the core of GDB or the target detach the child process, so GDB (or GDBserver) just stays ptrace-attached to the process. The result is that the fork child process is stuck, while you would expect it to be detached and run. Make GDBserver detach of fork children it knows about. That is done in the generic handle_detach function. Since a process_info already exists for the child, we can simply call detach_inferior on it. GDB-side, make the linux-nat and remote targets detach of fork children known because of pending fork events. These pending fork events can be stored in: - thread_info::pending_waitstatus, if the core has consumed the event but then saved it for later (for example, because it got the event while stopping all threads, to present an all-stop stop on top of a non-stop target) - thread_info::pending_follow: if we ran to a "catch fork" and we detach at that moment Additionally, pending fork events can be in target-specific fields: - For linux-nat, they can be in lwp_info::status and lwp_info::waitstatus. - For the remote target, they could be stored as pending stop replies, saved in `remote_state::notif_state::pending_event`, if not acknowledged yet, or in `remote_state::stop_reply_queue`, if acknowledged. I followed the model of remove_new_fork_children for this: call remote_notif_get_pending_events to process / acknowledge any unacknowledged notification, then look through stop_reply_queue. Update the gdb.threads/pending-fork-event.exp test (and rename it to gdb.threads/pending-fork-event-detach.exp) to try to detach the process while it is stopped with a pending fork event. In order to verify that the fork child process is correctly detached and resumes execution outside of GDB's control, make that process create a file in the test output directory, and make the test wait $timeout seconds for that file to appear (it happens instantly if everything goes well). This test catches a bug in linux-nat.c, also reported as PR 28512 ("waitstatus.h:300: internal-error: gdb_signal target_waitstatus::sig() const: Assertion `m_kind == TARGET_WAITKIND_STOPPED || m_kind == TARGET_WAITKIND_SIGNALLED' failed.). When detaching a thread with a pending event, get_detach_signal unconditionally fetches the signal stored in the waitstatus (`tp->pending_waitstatus ().sig ()`). However, that is only valid if the pending event is of type TARGET_WAITKIND_STOPPED, and this is now enforced using assertions (iit would also be valid for TARGET_WAITKIND_SIGNALLED, but that would mean the thread does not exist anymore, so we wouldn't be detaching it). Add a condition in get_detach_signal to access the signal number only if the wait status is of kind TARGET_WAITKIND_STOPPED, and use GDB_SIGNAL_0 instead (since the thread was not stopped with a signal to begin with). Add another test, gdb.threads/pending-fork-event-ns.exp, specifically to verify that we consider events in pending stop replies in the remote target. This test has many threads constantly forking, and we detach from the program while the program is executing. That gives us some chance that we detach while a fork stop reply is stored in the remote target. To verify that we correctly detach all fork children, we ask the parent to exit by sending it a SIGUSR1 signal and have it write a file to the filesystem before exiting. Because the parent's main thread joins the forking threads, and the forking threads wait for their fork children to exit, if some fork child is not detach by GDB, the parent will not write the file, and the test will time out. If I remove the new remote_detach_pid calls in remote.c, the test fails eventually if I run it in a loop. There is a known limitation: we don't remove breakpoints from the children before detaching it. So the children, could hit a trap instruction after being detached and crash. I know this is wrong, and it should be fixed, but I would like to handle that later. The current patch doesn't fix everything, but it's a step in the right direction. Change-Id: I6d811a56f520e3cb92d5ea563ad38976f92e93dd Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28512
2021-12-08gdb/linux-nat: factor ptrace-detach code to new detach_one_pid functionSimon Marchi1-36/+40
The following patch will add some code paths that need to ptrace-detach a given PID. Factor out the code that does this and put it in its own function, so that it can be re-used. Change-Id: Ie65ca0d89893b41aea0a23d9fc6ffbed042a9705
2021-12-03gdb: change store_waitstatus to return a target_waitstatus by valueSimon Marchi1-3/+3
store_waitstatus is basically a translation function between a status integer and an equivalent target_waitstatus object. It would make sense for it to take the integer as a parameter and return the target_waitstatus by value. Do that, and rename to host_status_to_waitstatus. Users can then do: ws = host_status_to_waitstatus (status) which does the right thing, given the move constructor of target_waitstatus. Change-Id: I7a07d59d3dc19d3ed66929642f82f44f3e85d61b
2021-11-25gdb: hoist target_async_permitted checks into target.cAndrew Burgess1-3/+5
This commit moves the target_async_permitted check out of each targets ::can_async_p method and into the target_can_async_p wrapper function. I've left some asserts in the two ::can_async_p methods that I changed, which will hopefully catch any direct calls to these methods that might be added in the future. There should be no user visible changes after this commit.
2021-11-05Avoid /proc/pid/mem races (PR 28065)Pedro Alves1-185/+139
PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent failure) shows that GDB can hit an unexpected scenario -- it can happen that the kernel manages to open a /proc/PID/task/LWP/mem file, but then reading from the file returns 0/EOF, even though the process hasn't exited or execed. "0" out of read/write is normally what you get when the address space of the process the file was open for is gone, because the process execed or exited. So when GDB gets the 0, it returns memory access failure. In the bad case in question, the process hasn't execed or exited, so GDB fails a memory access when the access should have worked. GDB has code in place to gracefully handle the case of opening the /proc/PID/task/LWP/mem just while the LWP is exiting -- most often the open fails with EACCES or ENOENT. When it happens, GDB just tries opening the file for a different thread of the process. The testcase is written such that it stresses GDB's logic of closing/reopening the /proc/PID/task/LWP/mem file, by constantly spawning short lived threads. However, there's a window where the kernel manages to find the thread, but the thread exits just after and clears its address space pointer. In this case, the kernel creates a file successfully, but the file ends up with no address space associated, so a subsequent read/write returns 0/EOF too, just like if the whole process had execed or exited. This is the case in question that GDB does not handle. Oleg Nesterov gave this suggestion as workaround for that race: gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm. If statm reports something non-zero, then open() was "successfull". I think that might work. However, I didn't try it, because I realized we have another nasty race that that wouldn't fix. The other race I realized is that because we close/reopen the /proc/PID/task/LWP/mem file when GDB switches to a different inferior, then it can happen that GDB reopens /proc/PID/task/LWP/mem just after a thread execs, and before GDB has seen the corresponding exec event. I.e., we can open a /proc/PID/task/LWP/mem file accessing the post-exec address space thinking we're accessing the pre-exec address space. A few months back, Simon, Oleg and I discussed a similar race: [Bug gdb/26754] Race condition when resuming threads and one does an exec https://sourceware.org/bugzilla/show_bug.cgi?id=26754 The solution back then was to make the kernel fail any ptrace operation until the exec event is consumed, with this kernel commit: commit dbb5afad100a828c97e012c6106566d99f041db6 Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Wed May 12 15:33:08 2021 +0200 Commit: Linus Torvalds <torvalds@linux-foundation.org> CommitDate: Wed May 12 10:45:22 2021 -0700 ptrace: make ptrace() fail if the tracee changed its pid unexpectedly This however, only applies to ptrace, not to the /proc/pid/mem file opening case. Also, even if it did apply to the file open case, we would want to support current kernels until such a fix is more wide spread anyhow. So all in all, this commit gives up on the idea of only ever keeping one /proc/pid/mem file descriptor open. Instead, make GDB open a /proc/pid/mem per inferior, and keep it open until the inferior exits, is detached or execs. Make GDB open the file right after the inferior is created or is attached to or forks, at which point we know the inferior is stable and stopped and isn't thus going to exec, or have a thread exit, and so the file open won't fail (unless the whole process is SIGKILLed from outside GDB, at which point it doesn't matter whether we open the file). This way, we avoid both races described above, at the expense of using more file descriptors (one per inferior). Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28065 Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419
2021-10-21gdb, gdbserver: make target_waitstatus safeSimon Marchi1-33/+36
I stumbled on a bug caused by the fact that a code path read target_waitstatus::value::sig (expecting it to contain a gdb_signal value) while target_waitstatus::kind was TARGET_WAITKIND_FORKED. This meant that the active union field was in fact target_waitstatus::value::related_pid, and contained a ptid. The read signal value was therefore garbage, and that caused GDB to crash soon after. Or, since that GDB was built with ubsan, this nice error message: /home/simark/src/binutils-gdb/gdb/linux-nat.c:1271:12: runtime error: load of value 2686365, which is not a valid value for type 'gdb_signal' Despite being a large-ish change, I think it would be nice to make target_waitstatus safe against that kind of bug. As already done elsewhere (e.g. dynamic_prop), validate that the type of value read from the union matches what is supposed to be the active field. - Make the kind and value of target_waitstatus private. - Make the kind initialized to TARGET_WAITKIND_IGNORE on target_waitstatus construction. This is what most users appear to do explicitly. - Add setters, one for each kind. Each setter takes as a parameter the data associated to that kind, if any. This makes it impossible to forget to attach the associated data. - Add getters, one for each associated data type. Each getter validates that the data type fetched by the user matches the wait status kind. - Change "integer" to "exit_status", "related_pid" to "child_ptid", just because that's more precise terminology. - Fix all users. That last point is semi-mechanical. There are a lot of obvious changes, but some less obvious ones. For example, it's not possible to set the kind at some point and the associated data later, as some users did. But in any case, the intent of the code should not change in this patch. This was tested on x86-64 Linux (unix, native-gdbserver and native-extended-gdbserver boards). It was built-tested on x86-64 FreeBSD, NetBSD, MinGW and macOS. The rest of the changes to native files was done as a best effort. If I forgot any place to update in these files, it should be easy to fix (unless the change happens to reveal an actual bug). Change-Id: I0ae967df1ff6e28de78abbe3ac9b4b2ff4ad03b7