aboutsummaryrefslogtreecommitdiff
path: root/gdb/dwarf2/read.h
AgeCommit message (Collapse)AuthorFilesLines
2025-06-17gdb/dwarf: rename get_cu -> get_unitSimon Marchi1-4/+4
This method returns type units too, so "get_unit" is a better name. Change-Id: I6ec9de3f783637a3e206bcaaec96a4e00b4b7d31 Approved-By: Tom Tromey <tom@tromey.com>
2025-05-23gdb: guard <mutex> include with CXX_STD_THREADSimon Marchi1-0/+2
Change-Id: I4335fbfdabe49778fe37b08689eec59be94c424b
2025-05-23gdb: include <mutex> in dwarf2/read.hSimon Marchi1-0/+1
The buildbot pointed out this compilation failure on AlmaLinux, with g++ 8.5.0, which is not seen on more recent systems: CXX gdbtypes.o In file included from ../../binutils-gdb/gdb/gdbtypes.c:39: ../../binutils-gdb/gdb/dwarf2/read.h:639:8: error: ‘mutex’ in namespace ‘std’ does not name a type std::mutex dwo_files_lock; ^~~~~ ../../binutils-gdb/gdb/dwarf2/read.h:639:3: note: ‘std::mutex’ is defined in header ‘<mutex>’; did you forget to ‘#include <mutex>’? ../../binutils-gdb/gdb/dwarf2/read.h:35:1: +#include <mutex> ../../binutils-gdb/gdb/dwarf2/read.h:639:3: std::mutex dwo_files_lock; ^~~ Fix it by including <mutex> in dwarf2/read.h. Change-Id: Iba334a3dad217c86841a5e804d0f386876f5ff2f
2025-05-23gdb/dwarf: split dwo_lock in more granular locksSimon Marchi1-0/+5
The dwo_lock mutex is used to synchronize access to some dwo/dwp-related data structures, such as dwarf2_per_bfd::dwo_files and dwp_file::loaded_{cus,tus}. Right now the scope of the lock is kind of coarse. It is taken in the top-level lookup_dwo_unit function, and held while the thread: - looks up an existing dwo_file in the per-bfd hash table for the given id/signature - if there's no existing dwo_file, attempt to find a .dwo file, open it, build the list of units it contains - if a new dwo_file was created, insert it in the per-bfd hash table - look up the desired unit in the dwo_file And something similar for the dwp code path. This means that two indexing thread can't read in two dwo files simultaneously. This isn't ideal in terms of parallelism. This patch breaks this lock into 3 more fine grained locks: - one lock to access dwarf2_per_bfd::dwo_files - one lock to access dwp_file::loaded_{cus,tus} - one lock in try_open_dwop_file, where we do two operations that aren't thread safe (bfd_check_format and gdb_bfd_record_inclusion) Unfortunately I don't see a clear speedup on my computer with 8 threads. But the change shouldn't hurt, in theory, and hopefully this can be a piece that helps in making GDB scale better on machines with many cores (if we ever bump the max number of worker threads). This patch uses "double-checked locking" to avoid holding the lock(s) for the whole duration of reading in dwo files. The idea is, when looking up a dwo with a given name: - with the lock held, check for an existing dwo_file with that name in dwarf2_per_bfd::dwo_files, if found return it - if not found, drop the lock, load the dwo file and create a dwo_file describing it - with the lock held, attempt to insert the new dwo_file in dwarf2_per_bfd::dwo_files. If an entry exists, it means another thread simultaneously created an equivalent dwo_file, but won the race. Drop the new dwo_file and use the existing one. The new dwo_file is automatically deleted, because it is help by a unique_ptr and the insertion into the unordered_set fails. Note that it shouldn't normally happen for two threads to look up a dwo file with the same name, since different units will point to different dwo files. But it were to happen, we handle it. This way of doing things allows two threads to read in two different dwo files simulatenously, which in theory should help get better parallelism. The same technique is used for dwp_file::loaded_{cus,tus}. I have some local CI jobs that run the fission and fission-dwp boards, and I haven't seen regressions. In addition to the regular testing, I ran a few tests using those boards on a ThreadSanitizer build of GDB. Change-Id: I625c98b0aa97b47d5ee59fe22a137ad0eafc8c25 Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2025-05-12gdb/dwarf: move loops into locate_dw{o,z}_sectionsSimon Marchi1-2/+1
For a subsequent patch, it would be easier if the loop over sections inside locate_dwo_sections (I want to maintain some state for the duration of the loop). Move the for loop in there. And because locate_dwz_sections is very similar, modify that one too, to keep both in sync. Change-Id: I90b3d44184910cc2d86af265bb4b41828a5d2c2e Approved-By: Tom Tromey <tom@tromey.com>
2025-04-29gdb/dwarf: avoid cutu_reader movesSimon Marchi1-2/+0
In process_psymtab_comp_unit and ensure_cu_exists, we create a temporary cutu_reader on the stack, then move it to a heap allocated cutu_reader once we confirmed the unit is not dummy. I think it's unnecessary to create a temporary cutu_reader. The only downside of not doing so is that if it ends up that the CU is dummy, we made an allocation/deallocation for nothing. Dummy CUs are a rare thing, it shouldn't change anything. This allows removing the cutu_reader move constructor. Change-Id: I44742d471c495055ee46db41c0e7bdfbd2d5c0b7 Approved-By: Tom Tromey <tom@tromey.com>
2025-04-29gdb/dwarf: scan .debug_info.dwo just onceSimon Marchi1-10/+3
When building -gsplit-dwarf and -fdebug-types-section in DWARF 5, the resulting .dwo files will typically have a .debug_info.dwo section with multiple type units followed by one compile unit: $ llvm-dwarfdump -F -color a-test.dwo | grep ' Unit' 0x00000000: Type Unit: length = 0x000008a0, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_type, abbr_offset = 0x0000, addr_size = 0x08, name = 'vector<int, std::allocator<int> >', type_signature = 0xb499dcf29e2928c4, type_offset = 0x0023 (next unit at 0x000008a4) 0x000008a4: Type Unit: length = 0x00000099, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_type, abbr_offset = 0x0000, addr_size = 0x08, name = 'allocator<int>', type_signature = 0x496a8791a842701b, type_offset = 0x0023 (next unit at 0x00000941) ... 0x000015c1: Compile Unit: length = 0x00000f58, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset = 0x0000, addr_size = 0x08, DWO_id = 0xe8e359820d1c5803 (next unit at 0x0000251d) In open_and_init_dwo_file, we call create_dwo_cus_hash_table, which scans the section, looking for compile units, then call create_dwo_debug_types_hash_table, which scans the section again, looking for type units. It would make more sense to scan the section just once and handle both compile and type units at the same time. To achieve this, add create_dwo_unit_hash_tables, which knows how to handle both unit kinds in a single scan. It replaces create_dwo_cus_hash_table and create_dwo_debug_type_hash_table. Change open_and_init_dwo_file to call it. Note that I removed the DWARF version check in open_and_init_dwo_file when processing .debug_type.dwo sections: in DWARF 5, the .debug_type.dwo sections will just not exist, so the `dwo_file->sections.types` vector will be empty. Change-Id: I6e51d0ca06c258e0bf0e59927d62ae2df314a162 Approved-By: Tom Tromey <tom@tromey.com>
2025-04-29gdb/dwarf: replace some "compile unit" terminology with "unit"Simon Marchi1-9/+9
In DWARF 5 (and even previous versions, with type units), compile units are just one type of units. In many places, we still use "compile units" when in reality it would be better to talk about "units" (unless we specifically want to talk about compile units). Rename comp-unit-head.{c.h} to unit-head.{c,h}, and do a big pass of renames in it to remove the specific mentions of compile units, where in fact we want to talk about units in general. Change-Id: Ia06c90ccb25756c366f269a12620f2f7c8378adb Approved-By: Tom Tromey <tom@tromey.com>
2025-04-22Handle DWARF 5 separate debug sectionsTom Tromey1-4/+4
DWARF 5 standardized the .gnu_debugaltlink section that dwz emits in multi-file mode. This is handled via some new forms, and a new .debug_sup section. This patch adds support for this to gdb. It is largely straightforward, I think, though one oddity is that I chose not to have this code search the system build-id directories for the supplementary file. My feeling was that, while it makes sense for a distro to unify the build-id concept with the hash stored in the .debug_sup section, there's no intrinsic need to do so. This in turn means that a few tests -- for example those that test the index cache -- will not work in this mode. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32808 Acked-By: Simon Marchi <simon.marchi@efficios.com>
2025-04-20gdb/dwarf: make some more functions methods of cutu_readerSimon Marchi1-0/+33
These are only used by cutu_reader, so make them methods of cutu_reader. This makes it a bit more obvious in which context this code is called. lookup_dwo_unit_in_dwp can't be made a method of cutu_reader, as it is used in another context (lookup_dwp_signatured_type / lookup_signatured_type), which happens during CU expansion. Change-Id: Ic62c3119dd6ec198411768323aaf640ed165f51b Approved-By: Tom Tromey <tom@tromey.com>
2025-04-20gdb/dwarf: look up .dwp file ahead of timeSimon Marchi1-3/+0
get_dwp_file lazily looks for a .dwp file for the given objfile. It is called by indexing workers, when a cutu_reader object looks for a DWO file. It is called with the "dwo_lock" held, meaning that the first worker to get there will do the work, while the others will wait at the lock. I'm trying to reduce the time where this lock is taken and do other refactorings to make it easier to reason about the DWARF reader code. Moving the lookup of the .dwp file ahead, before we start parallelizing work, helps makes things simpler, because we can then assume everywhere else that we have already checked for a .dwp file. Put the call to open_and_init_dwp_file in dwarf2_has_info, right next to where we look up .dwz files. I used the same try-catch pattern as for the .dwz file lookup. Change-Id: I615da85f62a66d752607f0dbe9f0372dfa04b86b Approved-By: Tom Tromey <tom@tromey.com>
2025-04-08Update copyright dates to include 2025Tom Tromey1-1/+1
This updates the copyright headers to include 2025. I did this by running gdb/copyright.py and then manually modifying a few files as noted by the script. Approved-By: Eli Zaretskii <eliz@gnu.org>
2025-04-03gdb/dwarf: rename cache -> abbrev_cacheSimon Marchi1-1/+1
"cache" is just a bit too generic to be clear. Change-Id: I8bf01c5fe84e076af1afd2453b1a115777630271
2025-03-26gdb/dwarf: use reference in cutu_reader::cutu_reader interfaceSimon Marchi1-6/+6
Change some parameters to be references instead of pointers, when the value must not be nullptr. I'd like to do this more of this kind of change, but I have to limit the scope of the change, otherwise there's just no end (and some local variables could also be turned into references). So for now, just do it the cutu_reader constructors. Change-Id: I9442c6043726981d58f9b141f516c590c0a71bcc Approved-By: Tom Tromey <tom@tromey.com>
2025-03-24gdb/dwarf: remove cutu_reader::read_cutu_die_from_dwo abbrev table parameterSimon Marchi1-2/+1
This parameter is always used to set cutu_reader::m_dwo_abbrev_table. Remove the parameter, and have read_cutu_die_from_dwo set the field directly. Change-Id: I6c0c7d23591fb2c3d28cdea1befa4e6b379fd0d3 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-18gdb/dwarf: remove type_unit_groupSimon Marchi1-23/+29
The type_unit_group is an indirection between a stmt_list_hash (possible dwo_unit + line table section offset) and a type_unit_group_unshareable that provides no real value. In dwarf2_per_objfile, we maintain a stmt_list_hash -> type_unit_group mapping, and in dwarf2_per_objfile, we maintain a type_unit_group_unshareable mapping. The type_unit_group type is empty and only exists to have an identity and to be a link between the two mappings. This patch changes it so that we have a single stmt_list_hash -> type_unit_group_unshareable mapping. Regression tested on Debian 12 amd64 with a bunch of DWARF target boards. Change-Id: I9c5778ecb18963f353e9dd058e0f8152f7d8930c Approved-By: Tom Tromey <tom@tromey.com>
2025-03-18gdb/dwarf: use gdb::unordered_map for ↵Simon Marchi1-6/+15
dwarf2_per_bfd::{quick_file_names_table,type_unit_groups} Change these two hash tables to use gdb::unordered_map. I changed these two at the same time because they both use the same key, a stmt_list_hash. Unlike other previous patches that used a gdb::unordered_set, use an unordered_map here because the key isn't found in the element itself (well, it was before, because of how htab works, but it didn't need to be). You'll notice that the type_unit_group structure is empty. That structure isn't really needed. It is removed in the following patch. Regression tested on Debian 12 amd64 with a bunch of DWARF target boards. Change-Id: Iec2289958d0f755cab8198f5b72ecab48358ba11 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-18Introduce and use attribute::unsigned_constantTom Tromey1-1/+1
This introduces a new 'unsigned_constant' method on attribute. This method can be used to get the value as an unsigned number. Unsigned scalar forms are handled, and signed scalar forms are handled as well provided that the value is non-negative. Several spots in the reader that expect small DWARF-defined constants are updated to use this new method. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32680 Approved-By: Simon Marchi <simon.marchi@efficios.com>
2025-03-18gdb/dwarf: set m_top_level_die directly in read_cutu_die_from_dwoSimon Marchi1-1/+0
read_cutu_die_from_dwo currently returns the dwo's top-level DIE through a parameter. Following the previous patch, all code paths end up setting m_top_level_die. Simplify this by having read_cutu_die_from_dwo set m_top_level_die directly. I think it's easier to understand, because there's one less indirection to follow. Change-Id: Ib659f1d2e38501a8fe2b5dd0ca2add3ef55e8d60 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-18gdb/dwarf: fix spurious error when encountering dummy CUSimon Marchi1-5/+5
I built an application with -gsplit-dwarf (i.e. dwo), and some CUs are considered "dummy" by the DWARF reader. That is, the top-level DIE (DW_TAG_compile_unit) does not have any children. Here's the skeleton: 0x0000c0cb: Compile Unit: length = 0x0000001d, format = DWARF32, version = 0x0005, unit_type = DW_UT_skeleton, abbr_offset = 0x529b, addr_size = 0x08, DWO_id = 0x0ed2693dd2a756dc (next unit at 0x0000c0ec) 0x0000c0df: DW_TAG_skeleton_unit DW_AT_stmt_list [DW_FORM_sec_offset] (0x09dee00f) DW_AT_dwo_name [DW_FORM_strp] ("CMakeFiles/lib_crl.dir/crl/dispatch/crl_dispatch_queue.cpp.dwo") DW_AT_comp_dir [DW_FORM_strp] ("/home/simark/src/tdesktop/build-relwithdebuginfo-split-nogz/Telegram/lib_crl") DW_AT_GNU_pubnames [DW_FORM_flag_present] (true) And here's the entire debug info in the .dwo file: .debug_info.dwo contents: 0x00000000: Compile Unit: length = 0x0000001a, format = DWARF32, version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset = 0x0000, addr_size = 0x08, DWO_id = 0x0ed2693dd2a756dc (next unit at 0x0000001e) 0x00000014: DW_TAG_compile_unit DW_AT_producer [DW_FORM_strx] ("GNU C++20 14.2.1 20250207 -mno-direct-extern-access -mtune=generic -march=x86-64 -gsplit-dwarf -g3 -gz=none -O2 -std=gnu++20 -fPIC -fno-strict-aliasing") DW_AT_language [DW_FORM_data1] (DW_LANG_C_plus_plus_14) DW_AT_name [DW_FORM_strx] ("/home/simark/src/tdesktop/Telegram/lib_crl/crl/dispatch/crl_dispatch_queue.cpp") DW_AT_comp_dir [DW_FORM_strx] ("/home/simark/src/tdesktop/build-relwithdebuginfo-split-nogz/Telegram/lib_crl") When loading the binary in GDB, I see some warnings: $ ./gdb -q -nx --data-directory=data-directory -ex 'maint set dwarf sync on' -ex "file /home/simark/src/tdesktop/build-relwithdebuginfo-split-nogz/telegram-desktop" Reading symbols from /home/simark/src/tdesktop/build-relwithdebuginfo-split-nogz/telegram-desktop... DWARF Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0xc0cb DWARF Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0xc152 DWARF Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0xc194 DWARF Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0xc1b5 (gdb) It turns out that these errors are not really justified. What happens is: - cutu_reader::read_cutu_die_from_dwo return 0, indicating that the CU is "dummy" - back in cutu_reader::cutu_reader, we omit setting m_top_level_die to the DIE from the dwo file, meaning that m_top_level_die keeps pointing to the DIE from the main file (DW_TAG_skeleton_unit) - later, in cutu_reader::prepare_one_comp_unit, there is a check that m_top_level_die->tag is one of DW_TAG_{compile,partial,type}_unit, which triggers My proposal to fix this is to set m_top_level_die even if the CU is dummy. Even if the top-level DIE does not have any children, I don't see any reason to leave cutu_reader::m_top_level_die in a different state than when the CU is not dummy. While at it, set m_dummy_p directly in read_cutu_die_from_dwo, instead of returning a value and having the caller do it. This is all inside cutu_reader anyway. Change-Id: I483a68a369bb461a8dfa5bf2106ab1d6a0067198 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-17gdbsupport: add some -Wunused-* warning flagsSimon Marchi1-1/+0
Add a few -Wunused-* diagnostic flags that look useful. Some are known to gcc, some to clang, some to both. Fix the fallouts. -Wunused-const-variable=1 is understood by gcc, but not clang. -Wunused-const-variable would be undertsood by both, but for gcc at least it would flag the unused const variables in headers. This doesn't make sense to me, because as soon as one source file includes a header but doesn't use a const variable defined in that header, it's an error. With `=1`, gcc only warns about unused const variable in the main source file. It's not a big deal that clang doesn't understand it though: any instance of that problem will be flagged by any gcc build. Change-Id: Ie20d99524b3054693f1ac5b53115bb46c89a5156 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-14gdb/dwarf: remove some _1 suffixesSimon Marchi1-2/+2
These methods don't have (or no longer have) a counterpart without the _1 suffix, so remove the suffix. Change-Id: Ifdfe4fb3b6b09c6bb9e30c27acf9f9ecbcb207f2 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-14gdb/dwarf: remove cutu_reader::keep, add cutu_reader::release_cuSimon Marchi1-3/+2
This is a bit subjective, but I often struggle to understand what cutu_reader::keep is meant to do (keep what, where). Perhaps it's just a question of bad naming, but I think it's a bit confusing for cutu_reader to transfer the ownership of the dwarf2_cu to the per_objfile directly. Add the cutu::release_cu method and make the caller of cutu_reader transfer the ownership to the per_objfile object. Right now, it is theoretically possible for release_cu to return nullptr, so I made callers check the return value. A patch later in this series will change release_cu to ensure it always return non-nullptr, so those callers will get simplified. Change-Id: I3103ff894d1654a95c9d69001073c218501c988a Approved-By: Tom Tromey <tom@tromey.com>
2025-03-14gdb/dwarf: change cutu_reader::read_die_and_siblings to ↵Simon Marchi1-1/+2
cutu_reader::read_all_dies After construction of a cutu_reader, only the top-level DIE has been read in memory. If the caller wants to access the full DIE tree, it does: reader.top_level_die ()->child = reader.read_die_and_siblings (reader.top_level_die ()); I don't really like this poking into cutu_reader's data structures from the outside, I would prefer if that work was done by cutu_reader. Rename the read_die_and_siblings method to read_all_dies, and do that work inside cutu_reader. I also moved these operations inside the read_all_dies method: gdb_assert (cu->die_hash.empty ()); cu->die_hash.reserve (cu->header.get_length_without_initial () / 12); ... cu->dies = reader.top_level_die (); The rationale for this is that read_all_dies (and the functions it calls) is responsible for filling the die_hash set. So I think it makes sense for it to do the reserve. It is also cutu_reader's job, currently, to create and fill the fields of dwarf2_cu. So I think it makes sense for it to set cu->dies, after having read the DIEs in memory. Change-Id: I088c2e0b367db7d1f67e8c9e2d5b0d61165292fc Approved-By: Tom Tromey <tom@tromey.com>
2025-03-14gdb/dwarf: access m_info_ptr directly instead of passing info_ptr aroundSimon Marchi1-14/+7
The few methods of cutu_reader that read DIEs into memory generally receive an info_ptr that says where to start reading and return another one (either by return value or parameter) indicating where the caller should continue reading. We can avoid all this passing around by having these methods access m_info_ptr directly. This allows changing some methods that read DIEs to return `die_info *`, instead of returning it by parameter, which just makes the code simpler to read, I think. The only method that meaningfully reads and writes m_info_ptr (except the places that initially set it up) is read_full_die_1. It reads and increments m_info_ptr once to read the abbrev and once again to read each attribute. Other methods use it for logging. The methods cutu_reader::read_attribute and cutu_reader::read_attribute_value do not touch m_info_ptr directly, because they are used in cooked-indexer.c, which appears to read some things in a non-linear fashion, unlike cutu_reader's DIE-reading methods. The cooked indexer calls cutu_reader::info_ptr to get the m_info_ptr value just after the top-level DIE, and then it does its own attribute reading after that. Change-Id: I251f63d13d453a2827b21349760da033171880e2 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-14gdb/dwarf: factor out to cutu_reader::skip_one_attribute methodSimon Marchi1-0/+5
I was reading cutu_reader::skip_one_die, and thought that the code to skip one attribute made it quite difficult to read. Factor this code out to a new method, to get it out of the way. As a bonus, it transforms one goto in a recursion call, which is also easier to follow. Unfortunately, I have no idea how to test DW_FORM_indirect, as it doesn't seem to appear anywhere in the testsuite, and I don't think that compilers often emit that. Change-Id: I2257b3e594aafb7c7da52ddd55baa651cefb802f Approved-By: Tom Tromey <tom@tromey.com>
2025-03-14gdb/dwarf: remove pretend_language parameter from load_full_{comp,type}_unitSimon Marchi1-6/+2
I noticed that load_full_comp_unit and load_full_type_unit didn't use their pretend_language parameter. Remove them, and then remove more things that were needed to get the language value to that point, including the dwarf2_queue_item field. Change-Id: Ie8cb21c54ae49da065a1b0a20bf18ccb93961d1a Approved-By: Tom Tromey <tom@tromey.com>
2025-03-10Revert past commitsSimon Marchi1-19/+36
I accidentally pushed my work-in-progress branch... revert that. Sorry for the noise :(. The list of commits reverted are: ae2a50a9ae15 attempt to revamp to the CU/TU list e9386435c94f gdb/dwarf: print DWARF CUs/TUs in "maint print objfiles" 6cbd64aa3eb0 gdb/dwarf: add dwarf_source_language_name 32a187da7622 libiberty: move DW_LANG_* definitions to dwarf2.def b3fa38aef59d gdb/dwarf: move index unit vectors to debug names reader and use them 30ba74418982 gdb/dwarf: track comp and type units count bedb4e09f292 gdb/dwarf: remove unnecessary braces b4f18de12c77 gdb/dwarf: use ranged for loop in some pots Change-Id: I80aed2847025f5b15c16c997680783b39858a703
2025-03-10attempt to revamp to the CU/TU listSimon Marchi1-19/+17
Change-Id: I1c8214413583d540c10c9a2322ef2a21f8bb54e7
2025-03-10gdb/dwarf: move index unit vectors to debug names reader and use themSimon Marchi1-17/+0
Since these vectors contain the CU and TU lists as found in the .debug_names header, it seems like they are meant to be used by the .debug_names reader when handling a DW_IDX_compile_unit or DW_IDX_type_unit attribute. The value of the attribute would translate directly into an index into one of these vectors. However there's something fishy: it looks like these vectors aren't actually used in practice. They are used in the dwarf2_per_bfd::get_index_{c,t}u methods, which in turn aren't used anywhere. The handlers of DW_IDX_compile_unit and DW_IDX_type_unit use the dwarf2_per_bfd::get_cu method, assuming that all compile units are placed before type units in the dwarf2_per_bfd::all_units vector. I see several problems with that: 1. I found out [1] that the dwarf2_per_bfd::all_units didn't always have the CUs before the TUs. So indexing dwarf2_per_bfd::all_units with that assumption will not work. 2. The dwarf2_find_containing_comp_unit function assumes an ordering of units by section offset (among other criteria) in order to do a binary search. Even though it's probably commonly the case, nothing guarantees that the order of CUs and TUs in the .debug_names header (which defines the indices used to refer to them) will be sorted by section offset. It's not possible to make dwarf2_find_containing_comp_unit (assuming it wants to do a binary search by section offset) and the DW_IDX_compile_unit / DW_IDX_type_unit handlers use the same vector. 3. I have not tested this, but in the presence of a dwz supplementary file, the .debug_names reader should probably not put the units from the main and dwz files in the same vectors to look them up by index. Presumably, if both the main and dwz files have a .debug_names index, they have distinct CU / TU lists. So, an CU index of 1 in an index entry in the main file would refer to a different CU than an index of 1 in an index entry in the dwz file. The current code doesn't seem to account for that, it just indexes dwarf2_per_bfd::all_units. Since those vectors are kind of specific to the .debug_names reader, move them there, in the mapped_debug_names_reader struct. Then, update the handlers of DW_IDX_compile_unit and DW_IDX_type_unit to use them. [1] https://inbox.sourceware.org/gdb-patches/87a5ab5i5m.fsf@tromey.com/T/#mbdcfe35f94db33e59500eb0d3d225661cab016a4 Change-Id: I3958d70bb3875268143471da745aa09336ab2500
2025-03-10gdb/dwarf: track comp and type units countSimon Marchi1-1/+3
A subsequent commit will remove the all_comp_units and all_type_units array views, since the all_units vector will no longer be segmented between comp and type units. Some callers still need to know the number of each kind, so track that separately. Change-Id: I6ef184767a96e5be095bbf9142aa850adbb083ac
2025-03-10gdb/dwarf: save DWARF version in dwarf2_loclist_baton, remove it from ↵Simon Marchi1-21/+0
dwarf2_per_cu When running: $ make check TESTS="gdb.cp/cpexprs-debug-types.exp" RUNTESTFLAGS="--target_board=fission" I get: (gdb) break -qualified main /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.h:295: internal-error: version: Assertion `m_dwarf_version != 0' failed. The problem is that dwarf2_per_cu objects created in the read_cutu_die_from_dwo code path never have their DWARF version set. A seemingly obvious solution would be to add a call to dwarf2_per_cu::set_version in there (there's a patch in the referenced PR that does that). However, this comment in read_comp_units_from_section is a bit scary: /* Init this asap, to avoid a data race in the set_version in cutu_reader::cutu_reader (which may be run in parallel for the cooked index case). */ this_cu->set_version (cu_header.version); I don't know if a DWO file can be read while the cooked indexer runs, so if it would be a problem here, but I prefer to be safe than sorry. This patch side-steps the problem by deleting the DWARF version from dwarf2_per_cu. The only users of dwarf2_per_cu::version are the loclists callbacks in `loc.c`. Add the DWARF version to dwarf2_loclist_baton and modify those callbacks to get the version from there instead. Initialize that new field in fill_in_loclist_baton. I like this approach because there is no version field that is possibly unset now. I wasn't keen on doing this at first because I thought it would waste space, but the dwarf2_loclist_baton has 7 bytes of padding at the end anyway, so we might as well use that. Cc: Ricky Zhou <ricky@rzhou.org> Cc: Tom de Vries <tdevries@suse.de> Cc: Tom Tromey <tom@tromey.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32309 Change-Id: I30d4ede7d67da5d80ff65c6122f5868e1098ec52 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-10gdb/dwarf: rename comp_unit_die to top_level_dieSimon Marchi1-2/+2
The name "comp_unit_die" is a bit misleading, because it can also represent a type unit (DW_TAG_type_unit). I think that "top_level_die" is clear. Change-Id: Ibaac99897f0ac7499f0f82caeed3385e1e6ee870 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-10gdb/dwarf: add doc for cutu_reader::is_dummySimon Marchi1-0/+5
Change-Id: Ifb80557187c12822bdea7ad400c32c3dce968a7f Approved-By: Tom Tromey <tom@tromey.com>
2025-03-07gdb/dwarf: call other cutu_reader constructor in ensure_lang and ↵Simon Marchi1-2/+2
dw2_get_file_names PR 32742 shows this failing: $ make check TESTS="gdb.ada/access_to_unbounded_array.exp" RUNTESTFLAGS="--target_board=fission" Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.ada/access_to_unbounded_array.exp ... FAIL: gdb.ada/access_to_unbounded_array.exp: scenario=all: gdb_breakpoint: set breakpoint at foo.adb:23 (GDB internal error) Or, interactively: $ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.ada/access_to_unbounded_array/foo-all -ex 'b foo.adb:23' -batch /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:19567: internal-error: set_lang: Assertion `old_value == language_unknown || old_value == language_minimal || old_value == lang' failed. The symptom is that for a given dwarf2_per_cu, the language gets set twice. First, set to `language_ada`, and then, to `language_minimal`. It's unexpected for the language of a CU to get changed like this. The CU at offset 0x0 in the main file looks like: 0x00000000: Compile Unit: length = 0x00000030, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000034) 0x0000000b: DW_TAG_compile_unit DW_AT_low_pc [DW_FORM_addr] (0x000000000000339a) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000432) DW_AT_stmt_list [DW_FORM_sec_offset] (0x00000000) DW_AT_GNU_dwo_name [DW_FORM_strp] ("b~foo.dwo") DW_AT_comp_dir [DW_FORM_strp] ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array") DW_AT_GNU_pubnames [DW_FORM_flag_present] (true) DW_AT_GNU_addr_base [DW_FORM_sec_offset] (0x00000000) DW_AT_GNU_dwo_id [DW_FORM_data8] (0x277aee54e7bd47f7) This refers to the DWO file b~foo.dwo, whose top-level DIE is: .debug_info.dwo contents: 0x00000000: Compile Unit: length = 0x00000b63, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x00000b67) 0x0000000b: DW_TAG_compile_unit DW_AT_producer [DW_FORM_GNU_str_index] ("GNU Ada 14.2.1 20250207 -fgnat-encodings=minimal -gdwarf-4 -fdebug-types-section -fuse-ld=gold -gnatA -gnatWb -gnatiw -gdwarf-4 -gsplit-dwarf -ggnu-pubnames -gnatws -mtune=generic -march=x86-64") DW_AT_language [DW_FORM_data1] (DW_LANG_Ada95) DW_AT_name [DW_FORM_GNU_str_index] ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array/b~foo.adb") DW_AT_comp_dir [DW_FORM_GNU_str_index] ("/home/simark/build/binutils-gdb/gdb/testsuite/outputs/gdb.ada/access_to_unbounded_array") DW_AT_GNU_dwo_id [DW_FORM_data8] (0xdbeffefab180a2cb) The thing to note is that the language attribute is only present in the DIE in the DWO file, not on the DIE in the main file. The first time the language gets set is here: #0 dwarf2_per_cu::set_lang (this=0x50f0000044b0, lang=language_ada, dw_lang=DW_LANG_Ada95) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:20788 #1 0x0000555561666af6 in cutu_reader::prepare_one_comp_unit (this=0x7ffff10bf2b0, cu=0x51700008e000, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:21029 #2 0x000055556159f740 in cutu_reader::cutu_reader (this=0x7ffff10bf2b0, this_cu=0x50f0000044b0, per_objfile=0x516000066080, abbrev_table=0x510000004640, existing_cu=0x0, skip_partial=false, pretend_language=language_minimal, cache=0x7ffff11b95e0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3371 #3 0x00005555615a547a in process_psymtab_comp_unit (this_cu=0x50f0000044b0, per_objfile=0x516000066080, storage=0x7ffff11b95e0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3799 #4 0x00005555615a9292 in cooked_index_worker_debug_info::process_cus (this=0x51700008dc80, task_number=0, first=std::unique_ptr<dwarf2_per_cu> = {...}, end=std::unique_ptr<dwarf2_per_cu> = {...}) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:4122 In this code path (particularly this specific cutu_reader constructir), the work is done to find and read the DWO file. So the language is properly identifier as language_ada, all good so far. The second time the language gets set is: #0 dwarf2_per_cu::set_lang (this=0x50f0000044b0, lang=language_minimal, dw_lang=0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:20788 #1 0x0000555561666af6 in cutu_reader::prepare_one_comp_unit (this=0x7ffff0f42730, cu=0x517000091b80, pretend_language=language_minimal) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:21029 #2 0x00005555615a1822 in cutu_reader::cutu_reader (this=0x7ffff0f42730, this_cu=0x50f0000044b0, per_objfile=0x516000066080, pretend_language=language_minimal, parent_cu=0x0, dwo_file=0x0) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3464 #3 0x000055556158c850 in dw2_get_file_names (this_cu=0x50f0000044b0, per_objfile=0x516000066080) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:1956 #4 0x000055556158f4f5 in dw_expand_symtabs_matching_file_matcher (per_objfile=0x516000066080, file_matcher=...) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:2157 #5 0x00005555616329e2 in cooked_index_functions::expand_symtabs_matching (this=0x50200002ab50, objfile=0x516000065780, file_matcher=..., lookup_name=0x0, symbol_matcher=..., expansion_notify=..., search_flags=..., domain=..., lang_matcher=...) at /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:15912 #6 0x0000555562ca8a14 in objfile::map_symtabs_matching_filename (this=0x516000065780, name=0x50200002ad90 "break pck.adb", real_path=0x0, callback=...) at /home/smarchi/src/binutils-gdb/gdb/symfile-debug.c:207 #7 0x0000555562d68775 in iterate_over_symtabs (pspace=0x513000005600, name=0x50200002ad90 "break pck.adb", callback=...) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:727 Here, we use the other cutu_reader constructor, the one that does not look up the DWO file for the passed CU. If a DWO file exists for this CU, the caller is expected to pass it as a parameter. That cutu_reader constructor also ends up setting the language of the CU. But because it didn't read the DWO file, it didn't figure out the language is language_ada, so it tries to set the language to the default, language_minimal. A question is: why do we end up trying to set the CU's language is this context. This is completely unrelated to what we're trying to do, that is get the file names from the line table. Setting the language is a side-effect of just constructing a cutu_reader, which we need to look up attributes in dw2_get_file_names_reader. There are probably some cleanups to be done here, to avoid doing useless work like looking up and setting the CU's language when all we need is an object to help reading the DIEs and attributes. But that is future work. The same cutu_reader constructor is used in `dwarf2_per_cu::ensure_lang`. Since this is the version of cutu_reader that does not look up the DWO file, it will conclude that the language is language_minimal and set that as the CU's language. In other words, `dwarf2_per_cu::ensure_lang` will get the language wrong, pretty ironic. Fix this by using the other cutu_reader constructor in those two spots. Pass `per_objfile->get_cu (this_cu)`, as the `existing_cu` parameter. I think this is necessary, because that constructor has an assert to check that if `existing_cu` is nullptr, then there must not be an existing `dwarf2_cu` in the per_objfile. To avoid getting things wrong like this, I think that the second cutu_reader constructor should be reserved for the spots that do pass a non-nullptr dwo_file. The only spot at the moment in create_cus_hash_table, where we read multiple units from the same DWO file. In this context, I guess it makes sense for efficiency to get the dwo_file once and pass it down to cutu_reader. For that constructor, make the parameters non-optional, add "non-nullptr" asserts, and update the code to assume the passed values are not nullptr. What I don't know is if this change is problematic thread-wise, if the functions I have modified to use the other cutu_reader constructor can be called concurrently in worker threads. If so, I think it would be problematic. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32742 Change-Id: I980d16875b9a43ab90e251504714d0d41165c7c8 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-07gdb/dwarf: move cooked_indexer to cooked-indexer.{h,c}Simon Marchi1-0/+87
Move the cooked_indexer class declaration to a new cooked-indexer.h file, and the implementation to cooked-indexer.c. Change-Id: Ibff3b06045b2af65fa9516097acf732d7c2d9414 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-07gdb/dwarf: move cutu_reader to read.hSimon Marchi1-0/+148
In order to move some things outside of read.c, cutu_reader needs to be in a header file. Change-Id: Ib26d7949c55867848d109332caf2efb1a6e72923 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-05gdb/dwarf: store dwo_file_up in dwo_file_setSimon Marchi1-5/+8
Heap-allocated dwo_file objects, stored in dwarf2_per_bfd::dwo_files, are never freed. They are created in one of the create_dwo_unit_in_dwp_* or lookup_dwo_cutu functions. I confirmed this by running: $ make check TESTS="gdb.cp/anon-ns.exp" RUNTESTFLAGS="--target_board=fission-dwp" $ ./gdb -q -nx --data-directory=data-directory testsuite/outputs/gdb.cp/anon-ns/anon-ns -ex "p main" -ex "file" -batch ... and checking the ASan leak report. I also debugged this invocation of GDB, placed a breakpoint on ~dwo_file, and didn't see any hit. Change the dwo_file set to hold dwo_file_up objects. When the dwarf2_per_bfd object gets destroyed, dwo_file objects will automatically get destroyed. With this change, I see the related leaks disappear in the ASan leak report, and my ~dwo_file breakpoint gets hit when debugging GDB. Change-Id: Icb38539c3f9e553f3625c625a00fc63dd6e9f3c5 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-05gdb/dwarf: make dwarf2_per_bfd::dwo_files a gdb::unordered_setSimon Marchi1-3/+42
Change the dwarf2_per_bfd::dwo_files htab to a gdb::unordered_set. No behavior change expected, except maybe the failure case in lookup_dwo_cutu. If open_and_init_dwo_file returns nullptr, the previous code would leave the slot value empty (nullptr). Is this legit? With the new hash table, the only thing we can do really is not attempt to insert the nullptr value. Change-Id: I63992f388b1197e696ded4ea483634e8ae67fce4 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-04gdb/dwarf: pass is_dwz to dwarf2_per_cu constructorSimon Marchi1-5/+7
It is always known at construction time whether a dwarf2_per_cu is built to represent a unit from a dwz file or not, so pass that information through the constructor. Change-Id: I278c1894ed606451aad02e830085190bb724c473 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-04gdb/dwarf: make dwarf2_get_dwz_file a method of dwarf2_per_bfdSimon Marchi1-0/+20
dwarf2_get_dwz_file looks more or less like a simple getter of dwarf2_per_bfd::dwz_file, so make it into a method. I typically avoid the `get_` prefix for getters, but that would conflict with the field name here. Change-Id: Idd0d5b1bd3813babf438b20aac514b19c77cfc18 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-04gdb/dwarf: remove create_cu_from_index_listSimon Marchi1-7/+0
I noticed that create_cu_from_index_list is only used in read-gdb-index.c, so I started by moving it there. But given that this function is use at only one spot and doesn't do much, I opted to inline its code in the caller instead. Change-Id: Iebe0dc20d345fa70a2f11aa9ff1a04fe26a31407 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-03gdb/dwarf: remove unnecessary abfd parameter in dwarf2_per_bfd::locate_sectionsSimon Marchi1-2/+1
The parameter `abfd` is always the same as `this->obfd`, there is no need to pass it as a parameter. Change-Id: If7ad58ad4efdf6b070cbf2b8a73436bd8b452fa6 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-03gdb/dwarf: rename dwarf2_per_cu_data -> dwarf2_per_cuSimon Marchi1-65/+62
This scratches an itch I had for a while. I don't know why this struct type has "data" in its name. Others like "dwarf2_per_objfile" and "dwarf2_per_bfd" don't. The primary job of a structure is to hold data, there's no need to specify it. It also makes the name a bit shorter, which is always nice. Rename related types too. Change-Id: Ifb63195ff105809fc15b502f639c0bb4d18a675e Approved-By: Tom Tromey <tom@tromey.com> Reviewed-By: Guinevere Larsen <guinevere@redhat.com>
2025-02-28gdb/dwarf: add dwarf2_per_bfd::filename and use it where possibleSimon Marchi1-0/+4
I noticed we quite often use: bfd_get_filename (per_bfd->obfd) Add a shortcut for that. Change-Id: I4e33925a481fd44088386510b01936d38e1d7d38 Approved-By: Andrew Burgess <aburgess@redhat.com>
2025-02-27gdb/dwarf: pass unit length to dwarf2_per_cu_data constructorSimon Marchi1-4/+9
Most of the time, the length of a unit is known when constructing a dwarf2_per_cu_data or signatured_type. Add a construtor parameter for it. In the few cases where it isn't, pass 0, which leaves it unset. Change-Id: I0c8b9683164d3e3f3823ed995f71689a4d17fd96 Reviewed-By: Tom de Vries <tdevries@suse.de>
2025-02-25gdb/dwarf: fix signature_type created with nullptr sectionSimon Marchi1-0/+2
Commit c44ab627b02 ("gdb/dwarf: pass section to dwarf2_per_cu_data constructor") introduced a regression when using dwp. It can be reproduced with: $ make check TESTS="gdb.base/ptype-offsets.exp" RUNTESTFLAGS="--target_board=fission-dwp" Then, to investigate: $ ./gdb -nx -q --data-directory=data-directory testsuite/outputs/gdb.base/ptype-offsets/ptype-offsets -ex 'ptype int' Reading symbols from testsuite/outputs/gdb.base/ptype-offsets/ptype-offsets... /home/smarchi/src/binutils-gdb/gdb/dwarf2/read.c:3195:38: runtime error: member call on null pointer of type 'struct dwarf2_section_info' Commit c44ab627b02 removed the assignment of signatured_type::section (dwarf2_per_cu_data::section, really) in fill_in_sig_entry_from_dwo_entry with the justification that the section was already set when constructing the signatured_type. Well, that was true except for one spot in lookup_dwp_signatured_type which passes a nullptr section to add_type_unit. Fix that by passing the section to add_type_unit in that one spot. This is the same section that would have been set by fill_in_sig_entry_from_dwo_entry before. Add some asserts in the dwarf2_per_cu_data constructor to verity that the passed dwarf2_per_bfd and dwarf2_section_info are non-nullptr. Change-Id: If27dae6b4727957c96defc058c7e4be31472005b Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32739 Co-Authored-By: Tom de Vries <tdevries@suse.de> Approved-By: Tom Tromey <tom@tromey.com>
2025-02-25gdb/dwarf: convert dwarf2_per_bfd::signatured_types to a gdb::unordered_setSimon Marchi1-7/+36
I'd like to add these asserts in dwarf2_per_cu_data's constructor: gdb_assert (per_bfd != nullptr); gdb_assert (section != nullptr); However, these dummy instances of signatured_type, used as hash table lookup keys, are in the way: signatured_type find_sig_entry (nullptr, nullptr, sect_offset {}, sig); This motivated me to convert the dwarf2_per_bfd::signatured_types hash table to a gdb::unordered_set. This allows finding an entry by simply passing the signature (an integer) and removes the need to create dummy signatured_type objects. There is one small unfortunate pessimization: lookup_dwo_signatured_type, for instance, does a lookup by signature to find an existing signatured_type. If not found, it adds a new one by calling add_type_unit. The current code passes down the slot where to put the new element, avoiding an extra hash when inserting the new signatured_type. With the new code, I don't see a way to do that. This is probably negligible, but it bugs me. If we used a map of signature -> signatured_type, I would see a way, but then it would waste space. I see one change in behavior, that is not really important IMO. In read_comp_units_from_section, if duplicate signature type entries are detected, the code prior to this patch overwrites the existing signatured_type in the hash table with the new one. With this patch, the existing signatured_type is kept: the call to emplace does not insert the value if an existing equal value exists. Other than that, no behavior change expected. Change-Id: I0bef1b49cc63dbdf4e32fa9d4ea37f83025efc3e Approved-By: Tom Tromey <tom@tromey.com>
2025-02-25gdb: move "gdb:function_view" into quick-symbol.h typedefsSimon Marchi1-6/+5
All users of these typedefs use them inside a gdb::function_view. Move the gdb::function_view in the typedefs themselves. This shortens the types in function signatures and helps with readability, IMO. Rename them to remove the `_ftype` suffix: this suffix is not as relevant in C++ as it was in C. With function_view, the caller can pass more than just a simple "function". Anyway, I think it's clearer to name them after the role the callback has (listener, matcher, etc). Adjust some related comments. Change-Id: Iaf9f8ede68b51ea9e4d954792e8eb90def8659a6 Approved-By: Tom Tromey <tom@tromey.com>
2025-02-25gdb/dwarf: initialize tu_stats fieldsSimon Marchi1-7/+7
Initialize fields of tu_stats to 0, remove the explicit default initialization of dwarf2_per_bfd::tu_stats. Change-Id: I98b2d5c4171291a3df2569466559174fb7cf32b6 Approved-By: Tom Tromey <tom@tromey.com>