aboutsummaryrefslogtreecommitdiff
path: root/gdb
AgeCommit message (Collapse)AuthorFilesLines
2022-03-22Fix some stale header names from dwarf filesLuis Machado5-11/+11
Some of these references were not updated when they were moved to a separate directory.
2022-03-21gdb: Add missing #include in solib.hRoland McGrath1-1/+2
The gdb_bfd_ref_ptr type is used in solib.h declarations.
2022-03-21PR gdb/27570: missing support for debuginfod in core_target::build_file_mappingsAaron Merey6-1/+151
Add debuginfod support to core_target::build_file_mappings and locate_exec_from_corefile_build_id to enable the downloading of missing executables and shared libraries referenced in core files. Also add debuginfod support to solib_map_sections so that previously downloaded shared libraries can be retrieved from the local debuginfod cache. When core file shared libraries are found locally, verify that their build-ids match the corresponding build-ids found in the core file. If there is a mismatch, attempt to query debuginfod for the correct build and print a warning if unsuccessful: warning: Build-id of /lib64/libc.so.6 does not match core file. Also disable debuginfod when gcore invokes gdb. Debuginfo is not needed for core file generation so debuginfod queries will slow down gcore unnecessarily.
2022-03-21gdb: Add soname to build-id mapping for core filesAaron Merey4-1/+141
Since commit aa2d5a422 gdb has been able to read executable and shared library build-ids within core files. Expand this functionality so that each core file bfd maintains a map of soname to build-id for each shared library referenced in the core file. This feature may be used to verify that gdb has found the correct shared libraries for core files and to facilitate downloading shared libaries via debuginfod.
2022-03-21Watchpoint followed by catchpoint misreports watchpoint (PR gdb/28621)Pedro Alves5-15/+177
If GDB reports a watchpoint hit, and then the next event is not TARGET_WAITKIND_STOPPED, but instead some event for which there's a catchpoint, such that GDB calls bpstat_stop_status, GDB mistakenly thinks the watchpoint triggered. Vis, using foll-fork.c: (gdb) awatch v Hardware access (read/write) watchpoint 2: v (gdb) catch fork Catchpoint 3 (fork) (gdb) c Continuing. Hardware access (read/write) watchpoint 2: v Old value = 0 New value = 5 main () at gdb.base/foll-fork.c:16 16 pid = fork (); (gdb) Continuing. Hardware access (read/write) watchpoint 2: v <<<< <<<< these lines are spurious Value = 5 <<<< Catchpoint 3 (forked process 1712369), arch_fork (ctid=0x7ffff7fa4810) at arch-fork.h:49 49 arch-fork.h: No such file or directory. (gdb) The problem is that when we handle the fork event, nothing called watchpoints_triggered before calling bpstat_stop_status. Thus, each watchpoint's watchpoint_triggered field was still set to watch_triggered_yes from the previous (real) watchpoint stop. watchpoint_triggered is only current called in the handle_signal_stop path, when handling TARGET_WAITKIND_STOPPED. This fixes it by adding watchpoint_triggered calls in the other events paths that call bpstat_stop_status. But instead of adding them explicitly, it adds a new function bpstat_stop_status_nowatch that wraps bpstat_stop_status and calls watchpoint_triggered, and then replaces most calls to bpstat_stop_status with calls to bpstat_stop_status_nowatch. This required constifying watchpoints_triggered. New test included, which fails without the fix. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28621 Change-Id: I282b38c2eee428d25319af3bc842f9feafed461c
2022-03-21gdb: re-generate config.inSimon Marchi1-3/+0
I'm getting this diff when running `autoreconf -vf`. Change-Id: Id5f009d0f0481935c1ee9df5332cb4bf45fbd32d
2022-03-21gdb/x86: handle stap probe arguments in xmm registersAndrew Burgess7-0/+237
On x86 machines with xmm register, and with recent versions of systemtap (and gcc?), it can occur that stap probe arguments will be placed into xmm registers. I notice this happening on a current Fedora Rawhide install with the following package versions installed: $ rpm -q glibc systemtap gcc glibc-2.35.9000-10.fc37.x86_64 systemtap-4.7~pre16468670g9f253544-1.fc37.x86_64 gcc-12.0.1-0.12.fc37.x86_64 If I check the probe data in libc, I see this: $ readelf -n /lib64/libc.so.6 ... stapsdt 0x0000004d NT_STAPSDT (SystemTap probe descriptors) Provider: libc Name: pthread_start Location: 0x0000000000090ac3, Base: 0x00000000001c65c4, Semaphore: 0x0000000000000000 Arguments: 8@%xmm1 8@1600(%rbx) 8@1608(%rbx) stapsdt 0x00000050 NT_STAPSDT (SystemTap probe descriptors) Provider: libc Name: pthread_create Location: 0x00000000000912f1, Base: 0x00000000001c65c4, Semaphore: 0x0000000000000000 Arguments: 8@%xmm1 8@%r13 8@8(%rsp) 8@16(%rsp) ... Notice that for both of these probes, the first argument is a uint64_t stored in the xmm1 register. Unfortunately, if I try to use this probe within GDB, then I can't view the first argument. Here's an example session: $ gdb $(which gdb) (gdb) start ... (gdb) info probes stap libc pthread_create ... (gdb) break *0x00007ffff729e2f1 # Use address of probe. (gdb) continue ... (gdb) p $_probe_arg0 Invalid cast. What's going wrong? If I re-run my session, but this time use 'set debug stap-expression 1', this is what I see: (gdb) set debug stap-expression 1 (gdb) p $_probe_arg0 Operation: UNOP_CAST Operation: OP_REGISTER String: xmm1 Type: uint64_t Operation: UNOP_CAST Operation: OP_REGISTER String: r13 Type: uint64_t Operation: UNOP_CAST Operation: UNOP_IND Operation: UNOP_CAST Operation: BINOP_ADD Operation: OP_LONG Type: long Constant: 0x0000000000000008 Operation: OP_REGISTER String: rsp Type: uint64_t * Type: uint64_t Operation: UNOP_CAST Operation: UNOP_IND Operation: UNOP_CAST Operation: BINOP_ADD Operation: OP_LONG Type: long Constant: 0x0000000000000010 Operation: OP_REGISTER String: rsp Type: uint64_t * Type: uint64_t Invalid cast. (gdb) The important bit is this: Operation: UNOP_CAST Operation: OP_REGISTER String: xmm1 Type: uint64_t Which is where we cast the xmm1 register to uint64_t. And the final piece of the puzzle is: (gdb) ptype $xmm1 type = union vec128 { v8bf16 v8_bfloat16; v4f v4_float; v2d v2_double; v16i8 v16_int8; v8i16 v8_int16; v4i32 v4_int32; v2i64 v2_int64; uint128_t uint128; } So, we are attempting to cast a union type to a scalar type, which is not supporting in C/C++, and as a consequence GDB's expression evaluator throws an error when we attempt to do this. The first approach I considered for solving this problem was to try and make use of gdbarch_stap_adjust_register. We already have a gdbarch method (gdbarch_stap_adjust_register) that allows us to tweak the name of the register that we access. Currently only x86 architectures use this to transform things like ax to eax in some cases. I wondered, what if we change gdbarch_stap_adjust_register to do more than just change the register names? What if this method instead became gdbarch_stap_read_register. This new method would return a operation_up, and would take the register name, and the type we are trying to read from the register, and return the operation that actually reads the register. The default implementation of this method would just use user_reg_map_name_to_regnum, and then create a register_operation, like we already do in stap_parse_register_operand. But, for x86 architectures this method would fist possibly adjust the register name, then do the default action to read the register. Finally, for x86 this method would spot when we were accessing an xmm register, and, based on the type being pulled from the register, would extract the correct field from the union. The benefit of this approach is that it would work with the expression types that GDB currently supports. The draw back would be that this approach would not be very generic. We'd need code to handle each sub-field size with an xmm register. If other architectures started using vector registers for probe arguments, those architectures would have to create their own gdbarch_stap_read_register method. And finally, the type of the xmm registers comes from the type defined in the target description, there's a risk that GDB might end up hard-coding the names of type sub-fields, then if a target uses a different target description, with different field names for xmm registers, the stap probes would stop working. And so, based on all the above draw backs, I rejected this first approach. My second plan involves adding a new expression type to GDB called unop_extract_operation. This new expression takes a value and a type, during evaluation the value contents are fetched, and then a new value is extracted from the value contents (based on type). This is similar to the following C expression: result_value = *((output_type *) &input_value); Obviously we can't actually build this expression in this case, as the input_value is in a register, but hopefully the above makes it clearer what I'm trying to do. The benefit of the new expression approach is that this code can be shared across all architectures, and it doesn't care about sub-field names within the union type. The draw-backs that I see are potential future problems if arguments are not stored within the least significant bytes of the register. However if/when that becomes an issue we can adapt the gdbarch_stap_read_register approach to allow architectures to control how a value is extracted. For testing, I've extended the existing gdb.base/stap-probe.exp test to include a function that tries to force an argument into an xmm register. Obviously, that will only work on a x86 target, so I've guarded the new function with an appropriate GCC define. In the exp script we use readelf to check if the probe exists, and is using the xmm register. If the probe doesn't exist then the associated tests are skipped. If the probe exists, put isn't using the xmm register (which will depend on systemtap/gcc versions), then again, the tests are skipped. Otherwise, we can run the test. I think the cost of running readelf is pretty low, so I don't feel too bad making all the non-xmm targets running this step. I found that on a Fedora 35 install, with these packages installed, I was able to run this test and have the probe argument be placed in an xmm register: $ rpm -q systemtap gcc glibc systemtap-4.6-4.fc35.x86_64 gcc-11.2.1-9.fc35.x86_64 glibc-2.34-7.fc35.x86_64 Finally, as this patch adds a new operation type, then I need to consider how to generate an agent expression for the new operation type. I have kicked the can down the road a bit on this. In the function stap_parse_register_operand, I only create a unop_extract_operation in the case where the register type is non-scalar, this means that in most cases I don't need to worry about generating an agent expression at all. In the xmm register case, when an unop_extract_operation will be created, I have sketched out how the agent expression could be handled, however, this code is currently not reached. When we try to generate the agent expression to place the xmm register on the stack, GDB hits this error: (gdb) trace -probe-stap test:xmmreg Tracepoint 1 at 0x401166 (gdb) actions Enter actions for tracepoint 1, one per line. End with a line saying just "end". >collect $_probe_arg0 Value not scalar: cannot be an rvalue. This is because GDB doesn't currently support placing non-scalar types on the agent expression evaluation stack. Solving this is clearly related to the original problem, but feels a bit like a second problem. I'd like to get feedback on whether my approach to solving the original problem is acceptable or not before I start looking at how to handle xmm registers within agent expressions.
2022-03-21gdb/testsuite: reformat gdb.python/pretty-print-call-by-hand.pySimon Marchi1-3/+7
Run black on the file. Change-Id: Ifb576137fb7158a0227173f61c1202f0695b3685
2022-03-21[gdb/testsuite] test a function call by hand from pretty printerBruno Larsen3-0/+221
The test case added here is testing the bug gdb/28856, where calling a function by hand from a pretty printer makes GDB crash. There are 6 mechanisms to trigger this crash in the current test, using the commands backtrace, up, down, finish, step and continue. Since the failure happens because of use-after-free (more details below) the tests will always have a chance of passing through sheer luck, but anecdotally they seem to fail all of the time. The reason GDB is crashing is a use-after-free problem. The above mentioned functions save a pointer to the current frame's information, then calls the pretty printer, and uses the saved pointer for different reasons, depending on the function. The issue happens because call_function_by_hand needs to reset the obstack to get the current frame, invalidating the saved pointer.
2022-03-21gdb/testsuite: Installed-GDB testing & data-directoryPedro Alves2-6/+51
In testsuite/README, we suggest that you can run the testsuite against some other GDB binary by using: make check RUNTESTFLAGS=GDB=/usr/bin/gdb However, that example isn't fully correct, because with that command line, the testsuite will still pass -data-directory=[pwd]/../data-directory to /usr/bin/gdb, like e.g.: ... builtin_spawn /usr/bin/gdb -nw -nx -data-directory /home/pedro/gdb/build/gdb/testsuite/../data-directory -iex set height 0 -iex set width 0 ... while if you're testing an installed GDB (the system GDB being the most usual scenario), then you should normally let it use its own configured directory, not the just-built GDB's data directory. This commit improves the status quo with the following two changes: - if the user specifies GDB on the command line, then by default, don't start GDB with the -data-directory command line option. I.e., let the tested GDB use its own configured data directory. - let the user override the data directory, via a new GDB_DATA_DIRECTORY global. This replaces the existing BUILD_DATA_DIRECTORY variable in testsuite/lib/gdb.exp, which wasn't overridable, and was a bit misnamed for the new purpose. So after this, the following commands I believe behave intuitively: # Test the non-installed GDB in some build dir: make check \ RUNTESTFLAGS="GDB=/path/to/other/build/gdb \ GDB_DATA_DIRECTORY=/path/to/other/build/gdb/data-directory" # Test the GDB installed in some prefix: make check \ RUNTESTFLAGS="GDB=/opt/gdb/bin/gdb" # Test the built GDB with some alternative data directory, e.g., the system GDB's data directory: make check \ RUNTESTFLAGS="GDB_DATA_DIRECTORY=/usr/share/gdb" Change-Id: Icdc21c85219155d9564a9900961997e6624b78fb
2022-03-21Add support for readline 8.2Andreas Schwab1-2/+2
In readline 8.2 the type of rl_completer_word_break_characters changed to include const.
2022-03-20Update gdb/NEWS after GDB 12 branch creation.Joel Brobecker1-1/+3
This commit a new section for the next release branch, and renames the section of the current branch, now that it has been cut.
2022-03-20Bump version to 13.0.50.DATE-git.Joel Brobecker2-2/+2
Now that the GDB 12 branch has been created, this commit bumps the version number in gdb/version.in to 13.0.50.DATE-git For the record, the GDB 12 branch was created from commit 2be64de603f8b3ae359d2d3fbf5db0e79869f32b. Also, as a result of the version bump, the following changes have been made in gdb/testsuite: * gdb.base/default.exp: Change $_gdb_major to 13.
2022-03-18gdb/python: remove gdb._mi_commands dictSimon Marchi7-208/+105
The motivation for this patch is the fact that py-micmd.c doesn't build with Python 2, due to PyDict_GetItemWithError being a Python 3-only function: CXX python/py-micmd.o /home/smarchi/src/binutils-gdb/gdb/python/py-micmd.c: In function ‘int micmdpy_uninstall_command(micmdpy_object*)’: /home/smarchi/src/binutils-gdb/gdb/python/py-micmd.c:430:20: error: ‘PyDict_GetItemWithError’ was not declared in this scope; did you mean ‘PyDict_GetItemString’? 430 | PyObject *curr = PyDict_GetItemWithError (mi_cmd_dict.get (), | ^~~~~~~~~~~~~~~~~~~~~~~ | PyDict_GetItemString A first solution to fix this would be to try to replace PyDict_GetItemWithError equivalent Python 2 code. But I looked at why we are doing this in the first place: it is to maintain the `gdb._mi_commands` Python dictionary that we use as a `name -> gdb.MICommand object` map. Since the `gdb._mi_commands` dictionary is never actually used in Python, it seems like a lot of trouble to use a Python object for this. My first idea was to replace it with a C++ map (std::unordered_map<std::string, gdbpy_ref<micmdpy_object>>). While implementing this, I realized we don't really need this map at all. The mi_command_py objects registered in the main MI command table can own their backing micmdpy_object (that's a gdb.MICommand, but seen from the C++ code). To know whether an mi_command is an mi_command_py, we can use a dynamic cast. Since there's one less data structure to maintain, there are less chances of messing things up. - Change mi_command_py::m_pyobj to a gdbpy_ref, the mi_command_py is now what keeps the MICommand alive. - Set micmdpy_object::mi_command in the constructor of mi_command_py. If mi_command_py manages setting/clearing that field in swap_python_object, I think it makes sense that it also takes care of setting it initially. - Move a bunch of checks from micmdpy_install_command to swap_python_object, and make them gdb_asserts. - In micmdpy_install_command, start by doing an mi_cmd_lookup. This is needed to know whether there's a Python MI command already registered with that name. But we can already tell if there's a non-Python command registered with that name. Return an error if that happens, rather than waiting for insert_mi_cmd_entry to fail. Change the error message to "name is already in use" rather than "may already be in use", since it's more precise. I asked Andrew about the original intent of using a Python dictionary object to hold the command objects. The reason was to make sure the objects get destroyed when the Python runtime gets finalized, not later. Holding the objects in global C++ data structures and not doing anything more means that the held Python objects will be decref'd after the Python interpreter has been finalized. That's not desirable. I tried it and it indeed segfaults. Handle this by adding a gdbpy_finalize_micommands function called in finalize_python. This is the mirror of gdbpy_initialize_micommands called in do_start_initialization. In there, delete all Python MI commands. I think it makes sense to do it this way: if it was somehow possible to unload Python support from GDB in the middle of a session we'd want to unregister any Python MI command. Otherwise, these MI commands would be backed with a stale PyObject or simply nothing. Delete tests that were related to `gdb._mi_commands`. Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I060d5ebc7a096c67487998a8a4ca1e8e56f12cd3
2022-03-18Fix crash with stepi, no debug info, and "set debug infrun 1"Pedro Alves1-1/+2
A stepi in a function without debug info with "set debug infrun 1" crashes GDB since commit c8353d682f69 (gdb/infrun: some extra infrun debug print statements), due to a reference to "tp->current_symtab->filename" when tp->current_symtab is null. This commit adds the missing null check. The output in this case becomes: [infrun] set_step_info: symtab = <null>, line = 0, step_frame_id = {stack=0x7fffffffd980,code=0x0000000000456c30,!special}, step_stack_frame_id = {stack=0x7fffffffd980,code=0x0000000000456c30,!special} Change-Id: I5171a9d222bc7e15b9dba2feaba7d55c7acd99f8
2022-03-18Implement gdbarch_stack_frame_destroyed_p for aarch64Tom Tromey1-0/+22
The internal AdaCore testsuite has a test that checks that an out-of-scope watchpoint is deleted. This fails on some aarch64 configurations, reporting an extra stop: (gdb) continue Continuing. Thread 3 hit Watchpoint 2: result Old value = 64 New value = 0 0x0000000040021648 in pck.get_val (seed=0, off_by_one=false) at [...]/pck.adb:13 13 end Get_Val; I believe what is happening here is that the variable is stored at: <efa> DW_AT_location : 2 byte block: 91 7c (DW_OP_fbreg: -4) and the extra stop is reported just before a return, when the ldp instruction is executed: 0x0000000040021644 <+204>: ldp x29, x30, [sp], #48 0x0000000040021648 <+208>: ret This instruction modifies the frame base calculation, and so the test picks up whatever memory is pointed to in the callee frame. Implementing the gdbarch hook gdbarch_stack_frame_destroyed_p fixes this problem. As usual with this sort of patch, it has passed internal testing, but I don't have a good way to try it with dejagnu. So, I don't know whether some existing test covers this. I suspect there must be one, but it's also worth noting that this test passes for aarch64 in some configurations -- I don't know what causes one to fail and another to succeed.
2022-03-18gdb: run black to format some Python filesSimon Marchi2-2/+4
Seems like some leftovers from previous commits. Change-Id: I7155ccdf7a4fef83bcb3d60320252c3628efdb83
2022-03-17Remove fall throughs in core_target::xfer_partial.John Baldwin1-2/+2
The cases for TARGET_OBJECT_LIBRARIES and TARGET_OBJECT_LIBRARIES_AIX can try to fetch different data objects (such as TARGET_OBJECT_SIGNAL_INFO) if gdbarch methods for the requested data aren't present. Return with TARGET_XFER_E_IO if the gdbarch method isn't present instead.
2022-03-17gdb: Remove support for S+corePedro Alves5-1586/+5
GCC removed support for score back in 2014 already. Back then, we basically agreed about removing it from GDB too, but it ended up being forgotten. See: https://sourceware.org/pipermail/gdb/2014-October/044643.html Following through this time around. Change-Id: I5b25a1ff7bce7b150d6f90f4c34047fae4b1f8b4
2022-03-17Add another test for Ada Wide_Wide_StringTom Tromey4-1/+17
In an earlier patch, I had written that I wanted to add this test: ptype Wide_Wide_String'("literal") ... but that it failed with the distro GNAT. Further investigation showed that it could be made to work by adding a function using Wide_Wide_String to the program -- this caused the type to end up in the debug info. This patch adds the test. I'm checking this in.
2022-03-16gdb: work around prompt corruption caused by bracketed-paste-modeAndrew Burgess3-1/+116
In this commit: commit b4f26d541aa7224b70d363932e816e6e1a857633 Date: Tue Mar 2 13:42:37 2021 -0700 Import GNU Readline 8.1 We imported readline 8.1 into GDB. As a consequence bug PR cli/28833 was reported. This bug spotted that, when the user terminated GDB by sending EOF (usually bound to Ctrl+d), the last prompt would become corrupted. Here's what happens, the user is sat at a prompt like this: (gdb) And then the user sends EOF (Ctrl+d), we now see this: quit) ... gdb terminates, and we return to the shell ... Notice the 'quit' was printed over the prompt. This problem is a result of readline 8.1 enabling bracketed paste mode by default. This problem is present in readline 8.0 too, but in that version of readline bracketed paste mode is off by default, so a user will not see the bug unless they specifically enable the feature. Bracketed paste mode is available in readline 7.0 too, but the bug is not present in this version of readline, see below for why. What causes this problem is how readline disables bracketed paste mode. Bracketed paste mode is a terminal feature that is enabled and disabled by readline emitting a specific escape sequence. The problem for GDB is that the escape sequence to disable bracketed paste mode includes a '\r' character at the end, see this thread for more details: https://lists.gnu.org/archive/html/bug-bash/2018-01/msg00097.html The change to add the '\r' character to the escape sequence used to disable bracketed paste mode was introduced between readline 7.0 and readline 8.0, this is why the bug would not occur when using older versions of readline (note: I don't know if its even possible to build GDB using readline 7.0. That really isn't important, I'm just documenting the history of this issue). So, the escape sequence to disable bracketed paste mode is emitted from the readline function rl_deprep_terminal, this is called after the user has entered a complete command and pressed return, or, if the user sends EOF. However, these two cases are slightly different. In the first case, when the user has entered a command and pressed return, the cursor will have moved to the next, empty, line, before readline emits the escape sequence to leave bracketed paste mode. The final '\r' character moves the cursor back to the beginning of this empty line, which is harmless. For the EOF case though, this is not what happens. Instead, the escape sequence to leave bracketed paste mode is emitted on the same line as the prompt. The final '\r' moves the cursor back to the start of the prompt line. This leaves us ready to override the prompt. It is worth noting, that this is not the intended behaviour of readline, in rl_deprep_terminal, readline should emit a '\n' character when EOF is seen. However, due to a bug in readline this does not happen (the _rl_eof_found flag is never set). This is the first readline bug that effects GDB. GDB prints the 'quit' message from command_line_handler (in event-top.c), this function is called (indirectly) from readline to process the complete command line, but also in the EOF case (in which case the command line is set to nullptr). As this is part of the callback to process a complete command, this is called after readline has disabled bracketed paste mode (by calling rl_deprep_terminal). And so, when bracketed paste mode is in use, rl_deprep_terminal leaves the cursor at the start of the prompt line (in the EOF case), and command_line_handler then prints 'quit', which overwrites the prompt. The solution to this problem is to print the 'quit' message earlier, before rl_deprep_terminal is called. This is easy to do by using the rl_deprep_term_function hook. It is this hook that usually calls rl_deprep_terminal, however, if we replace this with a new function, we can print the 'quit' string, and then call rl_deprep_terminal ourselves. This allows the 'quit' to be printed before rl_deprep_terminal is called. The problem here is that there is no way in rl_deprep_terminal to know if readline is processing EOF or not, and as a result, we don't know when we should print 'quit'. This is the second readline bug that effects GDB. Both of these readline issues are discussed in this thread: https://lists.gnu.org/archive/html/bug-readline/2022-02/msg00021.html The result of that thread was that readline was patched to address both of these issues. Now it should be easy to backport the readline fix to GDB's in tree copy of readline, and then change GDB to make use of these fixes to correctly print the 'quit' string. However, we are just about to branch GDB 12, and there is concern from some that changing readline this close to a new release is a risky idea, see this thread: https://sourceware.org/pipermail/gdb-patches/2022-March/186391.html So, this commit doesn't change readline at all. Instead, this commit is the smallest possible GDB change in order to avoid the prompt corruption. In this commit I change GDB to print the 'quit' string on the line after the prompt, but only when bracketed paste mode is on. This avoids the overwriting issue, the user sees this: (gdb) quit ... gdb terminates, and returns to the shell ... This isn't ideal, but is better than the existing behaviour. After GDB 12 has branched, we can backport the readline fix, and apply a real fix to GDB. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28833
2022-03-16Reimplement array concatenation for Ada and DTom Tromey7-134/+131
This started as a patch to implement string concatenation for Ada. However, while working on this, I looked at how this code could possibly be called. It turns out there are only two users of concat_operation: Ada and D. So, in addition to implementing this for Ada, this patch rewrites value_concat, removing the odd "concatenate or repeat" semantics, which were completely unused. As Ada and D both seem to represent strings using TYPE_CODE_ARRAY, this removes the TYPE_CODE_STRING code from there as well.
2022-03-16Remove eval_op_concatTom Tromey3-18/+2
eval_op_concat has code to search for an operator overload of BINOP_CONCAT. However, the operator overloading code is specific to C++, which does not have this operator. And, binop_types_user_defined_p rejects this case right at the start, and value_x_binop does not handle this case. I think this code has been dead for a very long time. This patch removes it and hoists the remaining call into concatenation::evaluate, removing eval_op_concat entirely.
2022-03-16Ada support for wide stringsTom Tromey3-6/+63
This adds some basic support for Wide_String and Wide_Wide_String to the Ada expression evaluator. In particular, a string literal may be converted to a wide or wide-wide string depending on context. The patch updates an existing test case. Note that another test, namely something like: ptype Wide_Wide_String'("literal") ... would be nice to add, but when tested against a distro GNAT, this did not work (probably due to lack of debuginfo); so, I haven't included it here.
2022-03-16Remove eval_op_stringTom Tromey2-15/+11
eval_op_string is only used in a single place -- the implementation of string_operation. This patch turns it into the string_operation::evaluate method, removing a bit of extraneous code.
2022-03-16Powerpc fix for gdb.base/ending-run.expCarl Love1-0/+16
The last two tests in gdb.base/ending-run.exp case fail on Powerpc when the system does not have the needed glibc debug-info files loaded. In this case, gdb is not able to determine where execution stopped. This behavior looks as follows for the test case: The next to the last test does a next command when the program is stopped at the closing bracket for main. The message printed is: 0x00007ffff7d01524 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6 which fails to match any of the test_multiple options. The test then does another next command. On Powerpc, the message printed it: Cannot find bounds of current function The test fails as the output does not match any of the options for the gdb_test_multiple. I checked the behavior on Powerpc to see if this is typical. I ran gdb on the following simple program as shown below. #include <stdio.h> int main(void) { printf("Hello, world!\n"); return 0; } gdb ./hello_world <snip the gdb start info> Type "apropos word" to search for commands related to "word"... Reading symbols from ./hello_world... (No debugging symbols found in ./hello_world) (gdb) break main Breakpoint 1 at 0x818 (gdb) r Starting program: /home/carll/hello_world [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1". Breakpoint 1, 0x0000000100000818 in main () (gdb) n Single stepping until exit from function main, which has no line number information. Hello, world! 0x00007ffff7d01524 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6 (gdb) n Cannot find bounds of current function So it would seem that the messages seen from the test case are "normal" output for Powerpc when the debug-info is not available. The following patch adds the output from Powerpc as an option to the gdb_test_multiple statement, identifying the output as the expected output on Powerpc without the needed debug-info files installed. The patch has been tested on a Power 10 system and an Intel 64-bit system. No additional regression failures were seen on either platform.
2022-03-16gdb/mi: consistently notify user when GDB/MI client uses -thread-selectJan Vrany8-69/+106
GDB notifies users about user selected thread changes somewhat inconsistently as mentioned on gdb-patches mailing list here: https://sourceware.org/pipermail/gdb-patches/2022-February/185989.html Consider GDB debugging a multi-threaded inferior with both CLI and GDB/MI interfaces connected to separate terminals. Assuming inferior is stopped and thread 1 is selected, when a thread 2 is selected using '-thread-select 2' command on GDB/MI terminal: -thread-select 2 ^done,new-thread-id="2",frame={level="0",addr="0x00005555555551cd",func="child_sub_function",args=[],file="/home/jv/Projects/gdb/users_jv_patches/gdb/testsuite/gdb.mi/user-selected-context-sync.c",fullname="/home/uuu/gdb/gdb/testsuite/gdb.mi/user-selected-context-sync.c",line="30",arch="i386:x86-64"} (gdb) and on CLI terminal we get the notification (as expected): [Switching to thread 2 (Thread 0x7ffff7daa640 (LWP 389659))] #0 child_sub_function () at /home/uuu/gdb/gdb/testsuite/gdb.mi/user-selected-context-sync.c:30 30 volatile int dummy = 0; However, now that thread 2 is selected, if thread 1 is selected using 'thread-select --thread 1 1' command on GDB/MI terminal terminal: -thread-select --thread 1 1 ^done,new-thread-id="1",frame={level="0",addr="0x0000555555555294",func="main",args=[],file="/home/jv/Projects/gdb/users_jv_patches/gdb/testsuite/gdb.mi/user-selected-context-sync.c",fullname="/home/jv/Projects/gdb/users_jv_patches/gdb/testsuite/gdb.mi/user-selected-context-sync.c",line="66",arch="i386:x86-64"} (gdb) but no notification is printed on CLI terminal, despite the fact that user selected thread has changed. The problem is that when `-thread-select --thread 1 1` is executed then thread is switched to thread 1 before mi_cmd_thread_select () is called, therefore the condition "inferior_ptid != previous_ptid" there does not hold. To address this problem, we have to move notification logic up to mi_cmd_execute () where --thread option is processed and notify user selected contents observers there if context changes. However, this in itself breaks GDB/MI because it would cause context notification to be sent on MI channel. This is because by the time we notify, MI notification suppression is already restored (done in mi_command::invoke(). Therefore we had to lift notification suppression logic also up to mi_cmd_execute (). This change in made distinction between mi_command::invoke() and mi_command::do_invoke() unnecessary as all mi_command::invoke() did (after the change) was to call do_invoke(). So this patches removes do_invoke() and moves the command execution logic directly to invoke(). With this change, all gdb.mi tests pass, tested on x86_64-linux. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=20631
2022-03-16Make gdb.fortran/{array-slices,lbound-ubound} work against gdbserverPedro Alves3-15/+45
gdb.fortran/array-slices.exp and gdb.fortran/lbound-ubound.exp were recently disabled unless testing with the native target, because they rely on inferior I/O. However, when testing against gdbserver using the native-gdbserver/native-extended-gdbserver boards, we do have access to inferior I/O. The right way to check whether the board can do I/O, is via checking the gdb,noinferiorio board variable. Switch to using that. And then, tweak the testcases to expect output to appear in inferior_spawn_id, instead of gdb_spawn_id. When testing against the native target, inferior_spawn_id is the same as gdb_spawn_id. When testing against gdbserver, it maps to gdbserver_spawn_id. This exposed a buglet in gdb.fortran/array-slices.f90's show_1d subroutine -- it was missing printing newline at the end of the "Expected GDB Output" text, leading to a test timeout. All other subroutines end with advance=yes, except this one. Fix it by using advance=yes here too. Change-Id: I4640729f334431cfc24b0917e7d3977b677c6ca5
2022-03-15Do not capture updated 'pc' in add_local_symbolsTom Tromey1-2/+2
Simon pointed out that commit 13835d88 ("Use function view when iterating over block symbols") caused a regression. The bug is that the new closure captures 'pc' by reference, but later code updates this variable -- but the earlier code did not update the callback structure with the new value. This patch restores the old behavior by using a new varible name in an inner scope.
2022-03-15gdb/testsuite: rename a proc and fix a typoAndrew Burgess1-41/+41
Rename a proc in gdb.mi/user-selected-context-sync.exp, I think the old name was most likely a typo. The old name match_re_or_ensure_not_output seems (to me) to imply we're in some way checking that the regexp was not output. But that's not what we are doing, we're checking either for the regexp, or for no output, hence the new name match_re_or_ensure_no_output. Additionally, I found a definite typo in one of the comments that I've also fixed. I also updated some test names. These tests (probably due to copy & paste errors) has 'on MI' on their name, when they were actually checking CLI output. For these test I changed the name to use 'on CLI'. There should be no change in what is tested after this commit.
2022-03-15gdb: LoongArch: fix failed testcases in gdb.base/align-c.expTiezhu Yang1-0/+4
When execute the following command on LoongArch: make check-gdb TESTS="gdb.base/align-c.exp" there exist some failed testcases: ... FAIL: gdb.base/align-c.exp: print _Alignof(struct align_pair_long_double_x_float) FAIL: gdb.base/align-c.exp: print _Alignof(struct align_pair_long_double_x_double) FAIL: gdb.base/align-c.exp: print _Alignof(struct align_pair_long_double_x_long_double) ... According to LoongArch ELF ABI specification [1], set the target data types of floating-point to fix the above failed testcases. [1] https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
2022-03-14gdb/python/mi: create MI commands using pythonAndrew Burgess12-18/+1553
This commit allows a user to create custom MI commands using Python similarly to what is possible for Python CLI commands. A new subclass of mi_command is defined for Python MI commands, mi_command_py. A new file, gdb/python/py-micmd.c contains the logic for Python MI commands. This commit is based on work linked too from this mailing list thread: https://sourceware.org/pipermail/gdb/2021-November/049774.html Which has also been previously posted to the mailing list here: https://sourceware.org/pipermail/gdb-patches/2019-May/158010.html And was recently reposted here: https://sourceware.org/pipermail/gdb-patches/2022-January/185190.html The version in this patch takes some core code from the previously posted patches, but also has some significant differences, especially after the feedback given here: https://sourceware.org/pipermail/gdb-patches/2022-February/185767.html A new MI command can be implemented in Python like this: class echo_args(gdb.MICommand): def invoke(self, args): return { 'args': args } echo_args("-echo-args") The 'args' parameter (to the invoke method) is a list containing (almost) all command line arguments passed to the MI command (--thread and --frame are handled before the Python code is called, and removed from the args list). This list can be empty if the MI command was passed no arguments. When used within gdb the above command produced output like this: (gdb) -echo-args a b c ^done,args=["a","b","c"] (gdb) The 'invoke' method of the new command must return a dictionary. The keys of this dictionary are then used as the field names in the mi command output (e.g. 'args' in the above). The values of the result returned by invoke can be dictionaries, lists, iterators, or an object that can be converted to a string. These are processed recursively to create the mi output. And so, this is valid: class new_command(gdb.MICommand): def invoke(self,args): return { 'result_one': { 'abc': 123, 'def': 'Hello' }, 'result_two': [ { 'a': 1, 'b': 2 }, { 'c': 3, 'd': 4 } ] } Which produces output like: (gdb) -new-command ^done,result_one={abc="123",def="Hello"},result_two=[{a="1",b="2"},{c="3",d="4"}] (gdb) I have required that the fields names used in mi result output must match the regexp: "^[a-zA-Z][-_a-zA-Z0-9]*$" (without the quotes). This restriction was never written down anywhere before, but seems sensible to me, and we can always loosen this rule later if it proves to be a problem. Much harder to try and add a restriction later, once people are already using the API. What follows are some details about how this implementation differs from the original patch that was posted to the mailing list. In this patch, I have changed how the lifetime of the Python gdb.MICommand objects is managed. In the original patch, these object were kept alive by an owned reference within the mi_command_py object. As such, the Python object would not be deleted until the mi_command_py object itself was deleted. This caused a problem, the mi_command_py were held in the global mi command table (in mi/mi-cmds.c), which, as a global, was not cleared until program shutdown. By this point the Python interpreter has already been shutdown. Attempting to delete the mi_command_py object at this point was causing GDB to try and invoke Python code after finalising the Python interpreter, and we would crash. To work around this problem, the original patch added code in python/python.c that would search the mi command table, and delete the mi_command_py objects before the Python environment was finalised. In contrast, in this patch, I have added a new global dictionary to the gdb module, gdb._mi_commands. We already have several such global data stores related to pretty printers, and frame unwinders. The MICommand objects are placed into the new gdb.mi_commands dictionary, and it is this reference that keeps the objects alive. When GDB's Python interpreter is shut down gdb._mi_commands is deleted, and any MICommand objects within it are deleted at this point. This change avoids having to make the mi_cmd_table global, and walk over it from within GDB's python related code. This patch handles command redefinition entirely within GDB's python code, though this does impose one small restriction which is not present in the original code (detailed below), I don't think this is a big issue. However, the original patch relied on being able to finish executing the mi_command::do_invoke member function after the mi_command object had been deleted. Though continuing to execute a member function after an object is deleted is well defined, it is also (IMHO) risky, its too easy for someone to later add a use of the object without realising that the object might sometimes, have been deleted. The new patch avoids this issue. The one restriction that is added to avoid this, is that an MICommand object can't be reinitialised with a different command name, so: (gdb) python cmd = MyMICommand("-abc") (gdb) python cmd.__init__("-def") can't reinitialize object with a different command name This feels like a pretty weird edge case, and I'm happy to live with this restriction. I have also changed how the memory is managed for the command name. In the most recently posted patch series, the command name is moved into a subclass of mi_command, the python mi_command_py, which inherits from mi_command is then free to use a smart pointer to manage the memory for the name. In this patch, I leave the mi_command class unchanged, and instead hold the memory for the name within the Python object, as the lifetime of the Python object always exceeds the c++ object stored in the mi_cmd_table. This adds a little more complexity in py-micmd.c, but leaves the mi_command class nice and simple. Next, this patch adds some extra functionality, there's a MICommand.name read-only attribute containing the name of the command, and a read-write MICommand.installed attribute that can be used to install (make the command available for use) and uninstall (remove the command from the mi_cmd_table so it can't be used) the command. This attribute will be automatically updated if a second command replaces an earlier command. This patch adds additional error handling, and makes more use the gdbpy_handle_exception function. Co-Authored-By: Jan Vrany <jan.vrany@labware.com>
2022-03-14gdb/gdbarch: compare some fields against 0 verify_gdbarchAndrew Burgess3-6/+27
After the previous commit, which removes the predicate function gdbarch_register_type_p, I noticed that the gdbarch->register_type field was not checked at in the verify_gdbarch function. More than not being checked, the field wasn't mentioned at all. I find this strange, I would expect that every field would at least be mentioned - we already generate comments for some fields saying that this field is _not_ being checked, so the fact that this field isn't being checked looks (to me), like this field is somehow slipping through the cracks. The comment at the top of gdbarch-components.py tries to explain how the validation is done. I didn't understand this comment completely, but, I think this final sentence: "Otherwise, the check is done against 0 (really NULL for function pointers, but same idea)." Means that, if non of the other cases apply, then the field should be checked against 0, with 0 indicating that the field is invalid (was not set by the tdep code). However, this is clearly not being done. Looking in gdbarch.py at the code to generate verify_gdbarch we do find that there is a case that is not handled, the case where the 'invalid' field is set true True, but non of the other cases apply. In this commit I propose two changes: 1. Handle the case where the 'invalid' field of a property is set to True, this should perform a check for the field of gdbarch still being set to 0, and 2. If the if/else series that generates verify_gdbarch doesn't handle a property then we should raise an exception. This means that if a property is added which isn't handled, we should no longer silently ignore it. After doing this, I re-generated the gdbarch files and saw that the following gdbarch fields now had new validation checks: register_type believe_pcc_promotion register_to_value value_to_register frame_red_zone_size displaced_step_restore_all_in_ptid solib_symbols_extension Looking at how these are currently set in the various -tdep.c files, I believe the only one of these that is required to be set for all architectures is the register_type field. And so, for all of the other fields, I've changed the property definition on gdbarch-components.py, setting the 'invalid' field to False. Now, after re-generation, the register_type field is checked against 0, thus an architecture that doesn't set_gdbarch_register_type will now fail during validation. For all the other fields we skip the validation, in which case, it is find for an architecture to not set this field. My expectation is that there should be no user visible changes after this commit. Certainly for all fields except register_type, all I've really done is cause some extra comments to be generated, so I think that's clearly fine. For the register_type field, my claim is that any architecture that didn't provide this would fail when creating its register cache, and I couldn't spot an architecture that doesn't provide this hook. As such, I think this change should be fine too.
2022-03-14gdb/gdbarch: remove the predicate function for gdbarch_register_typeAndrew Burgess3-14/+0
I don't believe that the gdbarch_register_type_p predicate is called anywhere in GDB, and the gdbarch_register_type function is called without checking the gdbarch_register_type_p predicate function everywhere it is used, for example in init_regcache_descr (regcache.c). My claim is that the gdbarch_register_type function is required for every architecture, and GDB will not work if this function is not supplied. And so, in this commit, I remove the 'predicate=True' from gdbarch-components.py for the 'register_type' field, and regenerate the gdbarch files. There should be no user visible changes after this commit.
2022-03-14Replace deprecated_target_wait_hook by observersPatrick Monnerat7-24/+26
Commit b60cea7 (Make target_wait options use enum flags) broke deprecated_target_wait_hook usage: there's a commit comment telling this hook has not been converted. Rather than trying to mend it, this patch replaces the hook by two target_wait observers: target_pre_wait (ptid_t ptid) target_post_wait (ptid_t event_ptid) Upon target_wait entry, target_pre_wait is notified with the ptid passed to target_wait. Upon exit, target_post_wait is notified with the event ptid returned by target_wait. Should an exception occur, event_ptid is null_ptid. This change benefits to Insight (out-of-tree): there's no real use of the late hook in gdb itself.
2022-03-14Correctly print subrange types in generic_value_printTom Tromey2-1/+86
I noticed that generic_value_print assumes that a subrange type is always a subrange of an integer type. However, this isn't necessarily the case. In Ada, for example, one has subranges of character and enumeration types. This code isn't often exercised, I think, because languages with real subrange types tend to implement their own printers. However, it still seemed worth fixing.
2022-03-14[aarch64/arm] Properly extract the return value returned in memoryLuis Machado4-5/+130
When running gdb.cp/non-trivial-retval.exp, the following shows up for both aarch64-linux and armhf-linux: Breakpoint 3, f1 (i1=23, i2=100) at src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:35 35 A a; (gdb) finish Run till exit from #0 f1 (i1=23, i2=100) at src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:35 main () at /src/gdb/testsuite/gdb.cp/non-trivial-retval.cc:163 163 B b = f2 (i1, i2); Value returned is $6 = {a = -11952} (gdb) The return value should be {a = 123} instead. This happens because the backends don't extract the return value from the correct location. GDB should fetch a pointer to the memory location from X8 for aarch64 and r0 for armhf. With the patch, gdb.cp/non-trivial-retval.exp has full passes on aarch64-linux and armhf-linux on Ubuntu 20.04/18.04. The problem only shows up with the "finish" command. The "call" command works correctly and displays the correct return value. This is also related to PR gdb/28681 (https://sourceware.org/bugzilla/show_bug.cgi?id=28681) and fixes FAIL's in gdb.ada/mi_var_array.exp. A new testcase is provided, and it exercises GDB's ability to "finish" a function that returns a large struct (> 16 bytes) and display the contents of this struct correctly. This has always been incorrect for these backends, but no testcase exercised this particular scenario.
2022-03-12Relax regexp in gdb.rust/unsized.expTom Tromey1-2/+1
With nightly rustc, gdb.rust/unsized.exp fails: (gdb) ptype *us Structure has no component named operator*. rustc changed to emit a bit more debug info for unsized types. Because the original test is just to make sure that ptype of an unsized array looks right, this patch relaxes the regexp and changes the expression. I think this keeps the original test meaning, but also works with nightly. I also tested stable and 1.48.
2022-03-11Avoid crash with cross-linux core fileTom Tromey1-4/+4
An internal test case creates a core file using gcore, then restarts gdb with that core. When run with a cross-linux gdb (in this case, x86-64 host with ppc64-linux target), the test fails: | (gdb) core core | [New LWP 18437] | warning: `/lib64/libc.so.6': Shared library architecture unknown is not compatible with target architecture powerpc:common64. | warning: Could not load shared library symbols for /lib64/ld64.so.1. | Do you need "set solib-search-path" or "set sysroot"? | ../../src/gdb/gdbarch.c:3388: internal-error: int gdbarch_elf_make_msymbol_special_p(gdbarch*): Assertion `gdbarch != NULL' failed. | A problem internal to GDB has been detected, | further debugging may prove unreliable. | Quit this debugging session? (y or n) y What's happening here is that the core file lists some shared libraries. These aren't available via the solib search path, and so gdb finds the local (x86-64) libraries. This is not ideal, but on the other hand, it is what was asked for -- while the test does set solib-search-path, it does not set the sysroot. But, because gdb isn't configured to handle these libraries, it crashes. It seems to me that it's better to avoid the crash by having solib_bfd_open fail in the case where a library is incompatible. That is what this patch does. Now it looks like: | [New LWP 15488] | Error while mapping shared library sections: | `/lib64/libc.so.6': Shared library architecture unknown is not compatible with target architecture powerpc:common64. ... and does not crash gdb. I don't have a good setup for testing this using dejagnu, so I don't know whether an existing gdb test covers this scenario.
2022-03-11gdb/testsuite: remove duplicates from gdb.base/stap-probe.expAndrew Burgess1-6/+11
Remove the duplicate test names from gdb.base/stap-probe.exp, this is done by actually passing a unique test name in a couple of places (rather than using the command as the test name), and in another couple of places, a test has a duplicate name due to a cut & paste error, which I've fixed. There's no change in what is actually being tested after this commit.
2022-03-10gdb/auto-load: Remove repeating "auto-load" from debug messageAaron Merey1-2/+1
Remove "auto-load:" from a format string passed to auto_load_debug_printf. It is unnecessary since this function will prefix the string with "[auto-load]" when printing it.
2022-03-10Change how "print/x" displays floating-point valueTom Tromey6-23/+52
Currently, "print/x" will display a floating-point value by first casting it to an integer type. This yields weird results like: (gdb) print/x 1.5 $1 = 0x1 This has confused users multiple times -- see PR gdb/16242, where there are several dups. I've also seen some confusion from this internally at AdaCore. The manual says: 'x' Regard the bits of the value as an integer, and print the integer in hexadecimal. ... which seems more useful. So, perhaps what happened is that this was incorrectly implemented (or maybe correctly implemented and then regressed, as there don't seem to be any tests). This patch fixes the bug. There was a previous discussion where we agreed to preserve the old behavior: https://sourceware.org/legacy-ml/gdb-patches/2017-06/msg00314.html However, I think it makes more sense to follow the manual. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16242
2022-03-10Simplify the ui-out progress APITom Tromey2-8/+8
I noticed that 'progress' is a method on ui-out, but it seems to me that it would be better if the only API were via the progress_meter class. This patch makes this change, changing progress to be a method on the meter itself.
2022-03-10gdb/gdbarch: fix typo in gdbarch-components.pyAndrew Burgess1-1/+1
Fixes a minor typo, in a comment, in the gdbarch-components.py script. There should be no user visible changes after this commit.
2022-03-10Process exit status is leader exit status testcaseLancelot SIX2-0/+110
This adds a multi-threaded testcase that has all threads in the process exit with a different exit code, and ensures that GDB reports the thread group leader's exit status as the whole-process exit status. Before this set of patches, this would randomly report the exit code of some other thread, and thus fail. Tested on Linux-x86_64, native and gdbserver. Co-Authored-By: Pedro Alves <pedro@palves.net> Change-Id: I30cba2ff4576fb01b5169cc72667f3268d919557
2022-03-10Re-add zombie leader on exit, gdb/linuxPedro Alves1-27/+80
The current zombie leader detection code in linux-nat.c has a race -- if a multi-threaded inferior exits just before check_zombie_leaders finds that the leader is now zombie via checking /proc/PID/status, check_zombie_leaders deletes the leader, assuming we won't get an event for that exit (which we won't in some scenarios, but not in this one). That might seem mostly harmless, but it has some downsides: - later when we continue pulling events out of the kernel, we will collect the exit event of the non-leader threads, and once we see the last lwp in our list exit, we return _that_ lwp's exit code as whole-process exit code to infrun, instead of the leader's exit code. - this can cause a hang in stop_all_threads in infrun.c. Say there are 2 threads in the process. stop_all_threads stops each of those threads, and then waits for two stop or exit events, one for each thread. If the whole process exits, and check_zombie_leaders hits the false-positive case, linux-nat.c will only return one event to GDB (the whole-process exit returned when we see the last thread, the non-leader thread, exit), making stop_all_threads hang forever waiting for a second event that will never come. However, in this false-positive scenario, where the whole process is exiting, as opposed to just the leader (with pthread_exit(), for example), we _will_ get an exit event shortly for the leader, after we collect the exit event of all the other non-leader threads. Or put another way, we _always_ get an event for the leader after we see it become zombie. I tried a number of approaches to fix this: #1 - My first thought to address the race was to make GDB always report the whole-process exit status for the leader thread, not for whatever is the last lwp in the list. We _always_ get a final exit (or exec) event for the leader, and when the race triggers, we're not collecting it. #2 - My second thought was to try to plug the race in the first place. I thought of making GDB call waitpid/WNOHANG for all non-leader threads immediately when the zombie leader is detected, assuming there would be an exit event pending for each of them waiting to be collected. Turns out that that doesn't work -- you can see the leader become zombie _before_ the kernel kills all other threads. Waitpid in that small time window returns 0, indicating no-event. Thankfully we hit that race window all the time, which avoided trading one race for another. Looking at the non-leader thread's status in /proc doesn't help either, the threads are still in running state for a bit, for the same reason. #3 - My next attempt, which seemed promising, was to synchronously stop and wait for the stop for each of the non-leader threads. For the scenario in question, this will collect all the exit statuses of the non-leader threads. Then, if we are left with only the zombie leader in the lwp list, it means we either have a normal while-process exit or an exec, in which case we should not delete the leader. If _only_ the leader exited, like in gdb.threads/leader-exit.exp, then after pausing threads, we will still have at least one live non-leader thread in the list, and so we delete the leader lwp. I got this working and polished, and it was only after staring at the kernel code to convince myself that this would really work (and it would, for the scenario I considered), that I realized I had failed to account for one scenario -- if any non-leader thread is _already_ stopped when some thread triggers a group exit, like e.g., if you have some threads stopped and then resume just one thread with scheduler-locking or non-stop, and that thread exits the process. I also played with PTRACE_EVENT_EXIT, see if it would help in any way to plug the race, and I couldn't find a way that it would result in any practical difference compared to looking at /proc/PID/status, with respect to having a race. So I concluded that there's no way to plug the race, we just have to deal with it. Which means, going back to approach #1. That is the approach taken by this patch. Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979
2022-03-10gdb: Reorganize linux_nat_filter_eventPedro Alves1-35/+40
Reorganize linux-nat.c:linux_nat_filter_event such that all the handling for events for LWPs not in the LWP list is together. This helps make a following patch clearer. The comments and debug messages have also been tweaked - the end goal is to have them synchronized with the gdbserver counterpart. Change-Id: I8586d8dcd76d8bd3795145e3056fc660e3b2cd22
2022-03-10Fix gdb.threads/current-lwp-dead.exp racePedro Alves2-37/+87
If we make GDB report the process EXIT event for the leader thread, as will be done in a latter patch of this series, then gdb.threads/current-lwp-dead.exp starts failing: (gdb) break fn_return Breakpoint 2 at 0x5555555551b5: file /home/pedro/rocm/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/current-lwp-dead.c, line 45. (gdb) continue Continuing. [New LWP 2138466] [Inferior 1 (process 2138459) exited normally] (gdb) FAIL: gdb.threads/current-lwp-dead.exp: continue to breakpoint: fn_return (the program exited) The inferior exit reported is actually correct. The main thread has indeed exited, and that's the thread that has the right exit code to report to the user, as that's the exit code that is reported to the program's parent. In this case, GDB managed to collect the exit code for the leader thread before reaping the other thread, because in reality, the testcase isn't creating standard threads, it is using raw clone, and the new clones are put in their own thread group. Fix it by making the main "thread" not exit until the scenario we're exercising plays out. Also, run the program to completion for completeness. The original program really wanted the leader thread to exit before the fn_return function was reached -- it was important that the current thread as pointed by inferior_ptid was gone when infrun got the breakpoint event. I've tweaked the testcase to ensure that that condition is still held, though it is no longer the main thread that exits. This required a bit of synchronization between the threads, which required using CLONE_VM unconditionally. The #ifdef guards were added as a fix for https://sourceware.org/bugzilla/show_bug.cgi?id=11214, though I don't think they were necessary because the program is not using TLS. If it turns out they were necessary, we can link the testcase with "-z now" instead, which was mentioned as an alternative workaround in that Bugzilla. Change-Id: I7be2f0da4c2fe8f80a60bdde5e6c623d8bd5a0aa
2022-03-10Fix gdb.threads/clone-new-thread-event.exp racePedro Alves2-1/+17
If we make GDB report the process EXIT event for the leader thread, instead of whatever is the last thread in the LWP list, as will be done in a latter patch of this series, then gdb.threads/current-lwp-dead.exp starts failing: (gdb) FAIL: gdb.threads/clone-new-thread-event.exp: catch SIGUSR1 (the program exited) This is a testcase race -- the main thread does not wait for the spawned clone "thread" to finish before exiting, so the main program may exit before the second thread is scheduled and reports its SIGUSR1. With the change to make GDB report the EXIT for the leader, the race is 100% reproducible by adding a sleep(), like so: --- c/gdb/testsuite/gdb.threads/clone-new-thread-event.c +++ w/gdb/testsuite/gdb.threads/clone-new-thread-event.c @@ -51,6 +51,7 @@ local_gettid (void) static int fn (void *unused) { + sleep (1); tkill (local_gettid (), SIGUSR1); return 0; } Resulting in: Breakpoint 1, main (argc=1, argv=0x7fffffffd418) at gdb.threads/clone-new-thread-event.c:65 65 stack = malloc (STACK_SIZE); (gdb) continue Continuing. [New LWP 3715562] [Inferior 1 (process 3715555) exited normally] (gdb) FAIL: gdb.threads/clone-new-thread-event.exp: catch SIGUSR1 (the program exited) That inferior exit reported is actually correct. The main thread has indeed exited, and that's the thread that has the right exit code to report to the user, as that's the exit code that is reported to the program's parent. In this case, GDB managed to collect the exit code for the leader thread before reaping the other thread, because in reality, the testcase isn't creating standard threads, it is using raw clone, and the new clones are put in their own thread group. Fix it by making the main thread wait for the child to exit. Also, run the program to completion for completeness. Change-Id: I315cd3dc2b9e860395dcab9658341ea868d7a6bf
2022-03-09GDB/testsuite: Fix a "displayed" typo in gdb.base/default.expMaciej W. Rozycki1-1/+1
Fix a typo, s/dislayed/displayed/ in default.exp at the top level.