From 55fc1623f942fba10362cb199f9356d75ca5835b Mon Sep 17 00:00:00 2001
From: Tom Tromey <tromey@adacore.com>
Date: Thu, 3 Nov 2022 13:49:17 -0600
Subject: Add name canonicalization for C

PR symtab/29105 shows a number of situations where symbol lookup can
result in the expansion of too many CUs.

What happens is that lookup_signed_typename will try to look up a type
like "signed int".  In cooked_index_functions::expand_symtabs_matching,
when looping over languages, the C++ case will canonicalize this type
name to be "int" instead.  Then this method will proceed to expand
every CU that has an entry for "int" -- i.e., nearly all of them.  A
crucial component of this is that the caller, objfile::lookup_symbol,
does not do this canonicalization, so when it tries to find the symbol
for "signed int", it fails -- causing the loop to continue.

This patch fixes the problem by introducing name canonicalization for
C.  The idea here is that, by making C and C++ agree on the canonical
name when a symbol name can have multiple spellings, we avoid the bad
behavior in objfile::lookup_symbol (and any other such code -- I don't
know if there is any).

Unlike C++, C only has a few situations where canonicalization is
needed.  And, in particular, due to the lack of overloading (thus
avoiding any issues in linespec) and due to the way c-exp.y works, I
think that no canonicalization is needed during symbol lookup -- only
during symtab construction.  This explains why lookup_name_info is not
touched.

The stabs reader is modified on a "best effort" basis.

The DWARF reader needed one small tweak in dwarf2_name to avoid a
regression in dw2-unusual-field-names.exp.  I think this is adequately
explained by the comment, but basically this is a scenario that should
not occur in real code, only the gdb test suite.

lookup_signed_typename is simplified.  It used to search for two
different type names, but now gdb can search just for the canonical
form.

gdb.dwarf2/enum-type.exp needed a small tweak, because the
canonicalizer turns "unsigned integer" into "unsigned int integer".
It seems better here to use the correct C type name.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105
Tested-by: Simon Marchi <simark@simark.ca>
Reviewed-by: Andrew Burgess <aburgess@redhat.com>
---
 gdb/c-lang.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

(limited to 'gdb/c-lang.c')
diff --git a/gdb/c-lang.c b/gdb/c-lang.c
index e15541f..46c0da0 100644
--- a/gdb/c-lang.c
+++ b/gdb/c-lang.c
@@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
 
 
 
+/* See c-lang.h.  */
+
+gdb::unique_xmalloc_ptr<char>
+c_canonicalize_name (const char *name)
+{
+  if (strchr (name, ' ') != nullptr
+      || streq (name, "signed")
+      || streq (name, "unsigned"))
+    return cp_canonicalize_string (name);
+  return nullptr;
+}
+
+
+
 void
 c_language_arch_info (struct gdbarch *gdbarch,
 		      struct language_arch_info *lai)
-- 
cgit v1.1