riscv-gnu-toolchain/gdb.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Pedro Alves <palves@redhat.com>	2017-11-08 14:22:32 +0000
committer	Pedro Alves <palves@redhat.com>	2017-11-08 16:02:24 +0000
commit	3f563c840a2c891ec2868b3e08bfaecb6f7aa57f (patch)
tree	1ebc19ff0d6da326ccb47a3242b0ec14c99535ac /gdb/python
parent	b5ec771e60c1a0863e51eb491c85c674097e9e13 (diff)
download	gdb-3f563c840a2c891ec2868b3e08bfaecb6f7aa57f.zip gdb-3f563c840a2c891ec2868b3e08bfaecb6f7aa57f.tar.gz gdb-3f563c840a2c891ec2868b3e08bfaecb6f7aa57f.tar.bz2

Optimize .gdb_index symbol name searching

As mentioned in the previous patch, .gdb_index name lookup got significantly slower with the previous patch. This patch addresses that, and in the process makes .gdb_index name searching faster than what we had before the previous patch, even. Using the same test: $ cat script.cmd set pagination off set $count = 0 while $count < 400 complete b string_prin printf "count = %d\n", $count set $count = $count + 1 end $ time gdb --batch -q ./gdb-with-index -ex "source script.cmd" I got, before the previous patch (-O2, x86-64): real 0m1.773s user 0m1.737s sys 0m0.040s and after this patch: real 0m1.361s user 0m1.315s sys 0m0.040s The basic idea here is simple: instead of always iterating over all the symbol names in the index, we build an accelerator/sorted name table and binary search names in it. Later in the series, we'll want to support wild matching for C++ too, so this mechanism already considers that. For example, say that you're looking up functions/methods named "func", no matter the containing namespace/class. If we sorted the table by qualified name, then we obviously wouldn't be able to find those symbols with a binary search: func ns1::a::b::func ns1::b::func ns2::func (function symbol names in .gdb_index have no parameter info, like psymbols) To address that out, we put an entry for each name component in the sorted table. something like this: Table Entry Actual symbol --------------------------------- func func func ns1::a::b::func b::func ns1::a::b::func a::b::func ns1::a::b::func ns1::a::b::func ns1::a::b::func func ns1::b::func b::func ns1::b::func ns1::b::func ns1::b::func func ns2::func ns2::func ns2::func Which sorted results in this: Table Entry Actual symbol --------------------------------- a::b::func ns1::a::b::func b::func ns1::a::b::func b::func ns1::b::func func func func ns1::a::b::func func ns1::b::func func ns2::func ns1::a::b::func ns1::a::b::func ns1::b::func ns1::b::func ns2::func ns2::func And we can binary search this. Note that a binary search approach works for both completion and regular lookup, while a name hashing approach only works for normal symbol looking, since obviously "fun" and "func" have different hashes. At first I was a bit wary of these tables potentially growing GDB's memory significantly. But I did an experiment that convinced it's not a worry at all. I hacked gdb to count the total number of entries in all the tables, attached that gdb to my system/Fedora's Firefox (Fedora's debug packages uses .gdb_index), did "set max-completions unlimited", and then hit "b [TAB]" to cause everything to expand. That resulted in 1351355 name_components. Each entry takes 8 bytes, so that's 10810840 bytes (ignoring std::vector overhead), or ~10.3 MB. That's IMO too small to worry about, given GDB was using over 7400MB total at that point. I.e., we're talking about 0.1% increase. dw2_expand_symtabs_matching unit tests covering this will be added in a follow up patch. If the size of this table turns out to be a concern, I have an idea to reduce the size of the table further at the expense of a bit more code -- the vast majority of the name offsets are either 0 or fit in 8-bits: total name_component = 1351355, of which, name_component::name_offset instances need 0 bits = 679531 name_component::name_offset instances need 8 bits = 669526 name_component::name_offset instances need 16 bits = 2298 name_component::name_offset instances need 32 bits = 0 name_component::idx instances need 0 bits = 51 name_component::idx instances need 8 bits = 8361 name_component::idx instances need 16 bits = 280329 name_component::idx instances need 32 bits = 1062614 so we could have separate tables for 0 name_offset, 8-bit name_offset and 32-bit name_offset. That'd give us roughly: 679531 * 0 + 669526 * 1 + 2298 * 4 + 1062614 * 4 = 4929174, or ~4.7MB with only 8-bit and 32-bit tables, that'd be: 1349057 * 1 + 2298 * 4 + 4 * 1351355 = 6763669 bytes, or ~6.5MB. I don't think we need to bother though. I also timed: $ time gdb --batch -q -p `pidof firefox` $ time gdb --batch -q -p `pidof firefox` -ex "b main" $ time gdb --batch -q -p `pidof firefox` -ex "set max-completion unlimited" -ex "complete b " and compared before previous patch vs this patch, and I didn't see a significant difference, seemingly because time to read debug info dominates. The "complete b " variant of the test takes ~2min currently... (I have a follow up series that speeds that up somewhat.) gdb/ChangeLog: 2017-11-08 Pedro Alves <palves@redhat.com> * dwarf2read.c (byte_swap, MAYBE_SWAP): Move higher up in file. (struct name_component): New. (mapped_index::name_components): New field. (mapped_index::symbol_name_at): New method. (dwarf2_read_index): Call mapped_index ctor. (dw2_map_matching_symbols): Add comment about name_components table. (dw2_expand_symtabs_matching): Factor part to... (dw2_expand_symtabs_matching_symbol): ... this new function. Build name components table, and lookup symbols in it before calling the name matcher. (dw2_expand_marked_cus): New, factored out from dw2_expand_symtabs_matching. (dwarf2_per_objfile_free): Call the mapped_index's dtor.

Diffstat (limited to 'gdb/python')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: