gdb/dwarf: create multiple cooked index shards when reading .debug_names

New in v2: - install address map in a single shard - update test gdb.mi/mi-sym-info.exp to cope with the fact that different symbols could be returned when using --max-results When playing with the .debug_names reader, I noticed it was significantly slower than the DWARF scanner. Using a "performance" build of GDB (with optimization, no runtime sanitizer enabled, etc), I measure with the following command on a rather large debug info file (~4 GB): $ time ./gdb -q -nx --data-directory=data-directory <binary> -iex 'maint set dwarf sync on' -batch This measures the time it takes for GDB to build the cooked index (plus some startup and exit overhead). I have a version of the binary without .debug_names and a version with .debug_names added using gdb-add-index. The results are: - without .debug_names: 7.5 seconds - with .debug_names: 24 seconds This is a bit embarrassing, given that the purpose of .debug_names is to accelerate things :). The reason is that the .debug_names processing is not parallelized at all, while the DWARF scanner is heavily parallelized. The process of creating the cooked index from .debug_names is roughly in two steps: 1. scanning of .debug_names and creation of cooked index entries (see mapped_debug_names_reader::scan_all_names) 2. finalization of the index, name canonicalization and sorting of the entries (see cooked_index::set_contents). This patch grabs a low hanging fruit by creating multiple cooked index shards instead of a single one during step one. Just doing this allows the second step of the processing to be automatically parallelized, as each shard is sent to a separate thread to be finalized. With this patch, I get: - without .debug_names: 7.5 seconds - with .debug_names: 9.7 seconds Not as fast as we'd like, but it's an improvement. The process of scanning .debug_names could also be parallelized to shave off a few seconds. My profiling shows that out of those ~10 seconds of excecution, about 6 are inside scan_all_names. Assuming perfect parallelization with 8 threads, it means that at best we could shave about 5 seconds from that time, which sounds interesting. I gave it a shot, but it's a much more intrusive change, I'm not sure if I will finish it. This patch caused some regressions in gdb.mi/mi-sym-info.exp with the cc-with-debug-names board, in the test about the `--max-results` switch. It appears at this test is relying on the specific symbols returned when using `--max-results`. As far as I know, we don't guarantee which specific symbols are returned, so any of the matching symbols could be returned. The round robin method used in this patch to assign index entries to shards ends up somewhat randomizing which CU gets expanded first during the symbol search, and therefore which order they appear in the objfile's CU list, and therefore which one gets searched first. I meditated on whether keeping compunits sorted within objfiles would help make things more stable and predictable. It would somewhat, but it wouldn't remove all sources of randomness. It would still possible for a call to `expand_symtabs_matching` to stop on the first hit. Which compunit gets expanded then would still be dependent on the specific `quick_symbol_functions` internal details / implementation. Commit 5b99c5718f1c ("[gdb/testsuite] Fix various issues in gdb.mi/mi-sym-info.exp") had already started to make the test a bit more flexible in terms of which symbols it accepts, but with this patch, I think it's possible to get wildly varying results. I therefore modified the test to count the number of returned symbols, but not expect any specific symbol. Change-Id: Ifd39deb437781f72d224ec66daf6118830042941 Approved-By: Tom Tromey <tom@tromey.com>
author: Simon Marchi <simon.marchi@polymtl.ca> 2025-02-09 00:51:04 -0500
committer: Simon Marchi <simon.marchi@efficios.com> 2025-02-10 11:28:56 -0500
commit: 13ab441fb40abffd9bb2a2ac32e1c69cba6c3198 (patch)
tree: 32c2a1ecd68449dd3a36c3685e8dc3a84dc0f734 /gdb/dwarf2/read-debug-names.c
parent: de33cf88daf58cf8322867389afa9bcf3e377696 (diff)
download: fsf-binutils-gdb-13ab441fb40abffd9bb2a2ac32e1c69cba6c3198.zip
fsf-binutils-gdb-13ab441fb40abffd9bb2a2ac32e1c69cba6c3198.tar.gz
fsf-binutils-gdb-13ab441fb40abffd9bb2a2ac32e1c69cba6c3198.tar.bz2
1 files changed, 31 insertions, 8 deletions
diff --git a/gdb/dwarf2/read-debug-names.c b/gdb/dwarf2/read-debug-names.c
index aeddab66b..fe31a58 100644
--- a/gdb/dwarf2/read-debug-names.c
+++ b/gdb/dwarf2/read-debug-names.c
@@ -28,6 +28,7 @@
 #include "read.h"
 #include "stringify.h"
 #include "extract-store-integer.h"
+#include "gdbsupport/thread-pool.h"
 
 /* This is just like cooked_index_functions, but overrides a single
    method so the test suite can distinguish the .debug_names case from
@@ -113,7 +114,14 @@ struct mapped_debug_names_reader
 
   std::unordered_map<ULONGEST, index_val> abbrev_map;
 
-  std::unique_ptr<cooked_index_shard> shard;
+  /* Even though the scanning of .debug_names and creation of the cooked index
+     entries is done serially, we create multiple shards so that the
+     finalization step can be parallelized.  The shards are filled in a round
+     robin fashion.  */
+  std::vector<cooked_index_shard_up> shards;
+
+  /* Next shard to insert an entry in.  */
+  int next_shard = 0;
 
   /* Maps entry pool offsets to cooked index entries.  */
   gdb::unordered_map<ULONGEST, cooked_index_entry *>
@@ -267,8 +275,14 @@ mapped_debug_names_reader::scan_one_entry (const char *name,
   /* Skip if we couldn't find a valid CU/TU index.  */
   if (per_cu != nullptr)
     {
-      *result = shard->add (die_offset, (dwarf_tag) indexval.dwarf_tag, flags,
-			    lang, name, nullptr, per_cu);
+      *result
+	= shards[next_shard]->add (die_offset, (dwarf_tag) indexval.dwarf_tag,
+				   flags, lang, name, nullptr, per_cu);
+
+      ++next_shard;
+      if (next_shard == shards.size ())
+	next_shard = 0;
+
       entry_pool_offsets_to_entries.emplace (offset_in_entry_pool, *result);
     }
 
@@ -403,14 +417,13 @@ cooked_index_debug_names::do_reading ()
 			  complaint_handler.release (),
 			  std::move (exceptions),
 			  parent_map ());
-  std::vector<std::unique_ptr<cooked_index_shard>> indexes;
-  indexes.push_back (std::move (m_map.shard));
   cooked_index *table
     = (gdb::checked_static_cast<cooked_index *>
        (per_bfd->index_table.get ()));
+
   /* Note that this code never uses IS_PARENT_DEFERRED, so it is safe
      to pass nullptr here.  */
-  table->set_contents (std::move (indexes), &m_warnings, nullptr);
+  table->set_contents (std::move (m_map.shards), &m_warnings, nullptr);
 
   bfd_thread_cleanup ();
 }
@@ -818,8 +831,18 @@ do_dwarf2_read_debug_names (dwarf2_per_objfile *per_objfile)
 			     &addrmap, &warnings);
   warnings.emit ();
 
-  map.shard = std::make_unique<cooked_index_shard> ();
-  map.shard->install_addrmap (&addrmap);
+  const auto n_workers
+    = std::max<std::size_t> (gdb::thread_pool::g_thread_pool->thread_count (),
+			     1);
+
+  /* Create as many index shard as there are worker threads.  */
+  for (int i = 0; i < n_workers; ++i)
+    map.shards.emplace_back (std::make_unique<cooked_index_shard> ());
+
+  /* There is a single address map for the whole index (coming from
+     .debug_aranges).  We only need to install it into a single shard for it to
+     get searched by cooked_index.  */
+  map.shards[0]->install_addrmap (&addrmap);
 
   auto cidn = (std::make_unique<cooked_index_debug_names>
 	       (per_objfile, std::move (map)));
author	Simon Marchi <simon.marchi@polymtl.ca>	2025-02-09 00:51:04 -0500
committer	Simon Marchi <simon.marchi@efficios.com>	2025-02-10 11:28:56 -0500
commit	13ab441fb40abffd9bb2a2ac32e1c69cba6c3198 (patch)
tree	32c2a1ecd68449dd3a36c3685e8dc3a84dc0f734 /gdb/dwarf2/read-debug-names.c
parent	de33cf88daf58cf8322867389afa9bcf3e377696 (diff)
download	fsf-binutils-gdb-13ab441fb40abffd9bb2a2ac32e1c69cba6c3198.zip fsf-binutils-gdb-13ab441fb40abffd9bb2a2ac32e1c69cba6c3198.tar.gz fsf-binutils-gdb-13ab441fb40abffd9bb2a2ac32e1c69cba6c3198.tar.bz2