gdb: add gdbarch method to get execution context from core file

Add a new gdbarch method which can read the execution context from a core file. An execution context, for this commit, means the filename of the executable used to generate the core file and the arguments passed to the executable. In later commits this will be extended further to include the environment in which the executable was run, but this commit is already pretty big, so I've split that part out into a later commit. Initially this new gdbarch method is only implemented for Linux targets, but a later commit will add FreeBSD support too. Currently when GDB opens a core file, GDB reports the command and arguments used to generate the core file. For example: (gdb) core-file ./core.521524 [New LWP 521524] Core was generated by `./gen-core abc def'. However, this information comes from the psinfo structure in the core file, and this struct only allows 80 characters for the command and arguments combined. If the command and arguments exceed this then they are truncated. Additionally, neither the executable nor the arguments are quoted in the psinfo structure, so if, for example, the executable was named 'aaa bbb' (i.e. contains white space) and was run with the arguments 'ccc' and 'ddd', then when this core file was opened by GDB we'd see: (gdb) core-file ./core.521524 [New LWP 521524] Core was generated by `./aaa bbb ccc ddd'. It is impossible to know if 'bbb' is part of the executable filename, or another argument. However, the kernel places the executable command onto the user stack, this is pointed to by the AT_EXECFN entry in the auxv vector. Additionally, the inferior arguments are all available on the user stack. The new gdbarch method added in this commit extracts this information from the user stack and allows GDB to access it. The information on the stack is writable by the user, so a user application can start up, edit the arguments, override the AT_EXECFN string, and then dump core. In this case GDB will report incorrect information, however, it is worth noting that the psinfo structure is also filled (by the kernel) by just copying information from the user stack, so, if the user edits the on stack arguments, the values reported in psinfo will change, so the new approach is no worse than what we currently have. The benefit of this approach is that GDB gets to report the full executable name and all the arguments without the 80 character limit, and GDB is aware which parts are the executable name, and which parts are arguments, so we can, for example, style the executable name. Another benefit is that, now we know all the arguments, we can poke these into the inferior object. This means that after loading a core file a user can 'show args' to see the arguments used. A user could even transition from core file debugging to live inferior debugging using, e.g. 'run', and GDB would restart the inferior with the correct arguments. Now the downside: finding the AT_EXECFN string is easy, the auxv entry points directly too it. However, finding the arguments is a little trickier. There's currently no easy way to get a direct pointer to the arguments. Instead, I've got a heuristic which I believe should find the arguments in most cases. The algorithm is laid out in linux-tdep.c, I'll not repeat it here, but it's basically a search of the user stack, starting from AT_EXECFN. If the new heuristic fails then GDB just falls back to the old approach, asking bfd to read the psinfo structure for us, which gives the old 80 character limited answer. For testing, I've run this series on (all GNU/Linux) x86-64. s390, ppc64le, and the new test passes in each case. I've done some very basic testing on ARM which does things a little different than the other architectures mentioned, see ARM specific notes in linux_corefile_parse_exec_context_1 for details.
author: Andrew Burgess <aburgess@redhat.com> 2024-04-25 09:36:43 +0100
committer: Andrew Burgess <aburgess@redhat.com> 2024-12-24 14:15:24 +0000
commit: d3d13bf876aae425ae0eff2ab0f1af9f7da0264a (patch)
tree: 96b4683cb2a1974d84650f798c58552e0996bb91 /gdb/linux-tdep.c
parent: 1eb397a6d20b312df11e787533f32d2312ced215 (diff)
download: fsf-binutils-gdb-d3d13bf876aae425ae0eff2ab0f1af9f7da0264a.zip
fsf-binutils-gdb-d3d13bf876aae425ae0eff2ab0f1af9f7da0264a.tar.gz
fsf-binutils-gdb-d3d13bf876aae425ae0eff2ab0f1af9f7da0264a.tar.bz2
1 files changed, 293 insertions, 0 deletions
diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index d345205..8d506fe 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -1835,6 +1835,297 @@ linux_corefile_thread (struct thread_info *info,
     }
 }
 
+/* Try to extract the inferior arguments, environment, and executable name
+   from core file CBFD.  */
+
+static core_file_exec_context
+linux_corefile_parse_exec_context_1 (struct gdbarch *gdbarch, bfd *cbfd)
+{
+  gdb_assert (gdbarch != nullptr);
+
+  /* If there's no core file loaded then we're done.  */
+  if (cbfd == nullptr)
+    return {};
+
+  /* This function (currently) assumes the stack grows down.  If this is
+     not the case then this function isn't going to help.  */
+  if (!gdbarch_stack_grows_down (gdbarch))
+    return {};
+
+  int ptr_bytes = gdbarch_ptr_bit (gdbarch) / TARGET_CHAR_BIT;
+
+  /* Find the .auxv section in the core file. The BFD library creates this
+     for us from the AUXV note when the BFD is opened.  If the section
+     can't be found then there's nothing more we can do.  */
+  struct bfd_section * section = bfd_get_section_by_name (cbfd, ".auxv");
+  if (section == nullptr)
+    return {};
+
+  /* Grab the contents of the .auxv section.  If we can't get the contents
+     then there's nothing more we can do.  */
+  bfd_size_type size = bfd_section_size (section);
+  if (bfd_section_size_insane (cbfd, section))
+    return {};
+  gdb::byte_vector contents (size);
+  if (!bfd_get_section_contents (cbfd, section, contents.data (), 0, size))
+    return {};
+
+  /* Parse the .auxv section looking for the AT_EXECFN attribute.  The
+     value of this attribute is a pointer to a string, the string is the
+     executable command.  Additionally, this string is placed at the top of
+     the program stack, and so will be in the same PT_LOAD segment as the
+     argv and envp arrays.  We can use this to try and locate these arrays.
+     If we can't find the AT_EXECFN attribute then we're not going to be
+     able to do anything else here.  */
+  CORE_ADDR execfn_string_addr;
+  if (target_auxv_search (contents, current_inferior ()->top_target (),
+			  gdbarch, AT_EXECFN, &execfn_string_addr) != 1)
+    return {};
+
+  /* Read in the program headers from CBFD.  If we can't do this for any
+     reason then just give up.  */
+  long phdrs_size = bfd_get_elf_phdr_upper_bound (cbfd);
+  if (phdrs_size == -1)
+    return {};
+  gdb::unique_xmalloc_ptr<Elf_Internal_Phdr>
+    phdrs ((Elf_Internal_Phdr *) xmalloc (phdrs_size));
+  int num_phdrs = bfd_get_elf_phdrs (cbfd, phdrs.get ());
+  if (num_phdrs == -1)
+    return {};
+
+  /* Now scan through the headers looking for the one which contains the
+     address held in EXECFN_STRING_ADDR, this is the address of the
+     executable command pointed too by the AT_EXECFN auxv entry.  */
+  Elf_Internal_Phdr *hdr = nullptr;
+  for (int i = 0; i < num_phdrs; i++)
+    {
+      /* The program header that contains the address EXECFN_STRING_ADDR
+	 should be one where all content is contained within CBFD, hence
+	 the check that the file size matches the memory size.  */
+      if (phdrs.get ()[i].p_type == PT_LOAD
+	  && phdrs.get ()[i].p_vaddr <= execfn_string_addr
+	  && (phdrs.get ()[i].p_vaddr
+	      + phdrs.get ()[i].p_memsz) > execfn_string_addr
+	  && phdrs.get ()[i].p_memsz == phdrs.get ()[i].p_filesz)
+	{
+	  hdr = &phdrs.get ()[i];
+	  break;
+	}
+    }
+
+  /* If we failed to find a suitable program header then give up.  */
+  if (hdr == nullptr)
+    return {};
+
+  /* As we assume the stack grows down (see early check in this function)
+     we know that the information we are looking for sits somewhere between
+     EXECFN_STRING_ADDR and the segments virtual address.  These define
+     the HIGH and LOW addresses between which we are going to search.  */
+  CORE_ADDR low = hdr->p_vaddr;
+  CORE_ADDR high = execfn_string_addr;
+
+  /* This PTR is going to be the address we are currently accessing.  */
+  CORE_ADDR ptr = align_down (high, ptr_bytes);
+
+  /* Setup DEREF a helper function which loads a value from an address.
+     The returned value is always placed into a uint64_t, even if we only
+     load 4-bytes, this allows the code below to be pretty generic.  All
+     the values we're dealing with are unsigned, so this should be OK.   */
+  enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
+  const auto deref = [=] (CORE_ADDR p) -> uint64_t
+    {
+      ULONGEST value = read_memory_unsigned_integer (p, ptr_bytes, byte_order);
+      return (uint64_t) value;
+    };
+
+  /* Now search down through memory looking for a PTR_BYTES sized object
+     which contains the value EXECFN_STRING_ADDR.  The hope is that this
+     will be the AT_EXECFN entry in the auxv table.  There is no guarantee
+     that we'll find the auxv table this way, but we will do our best to
+     validate that what we find is the auxv table, see below.  */
+  while (ptr > low)
+    {
+      if (deref (ptr) == execfn_string_addr
+	  && (ptr - ptr_bytes) > low
+	  && deref (ptr - ptr_bytes) == AT_EXECFN)
+	break;
+
+      ptr -= ptr_bytes;
+    }
+
+  /* If we reached the lower bound then we failed -- bail out.  */
+  if (ptr <= low)
+    return {};
+
+  /* Assuming that we are looking at a value field in the auxv table, move
+     forward PTR_BYTES bytes so we are now looking at the next key field in
+     the auxv table, then scan forward until we find the null entry which
+     will be the last entry in the auxv table.  */
+  ptr += ptr_bytes;
+  while ((ptr + (2 * ptr_bytes)) < high
+	 && (deref (ptr) != 0 || deref (ptr + ptr_bytes) != 0))
+    ptr += (2 * ptr_bytes);
+
+  /* PTR now points to the null entry in the auxv table, or we think it
+     does.  Now we want to find the start of the auxv table.  There's no
+     in-memory pattern we can search for at the start of the table, but
+     we can find the start based on the size of the .auxv section within
+     the core file CBFD object.  In the actual core file the auxv is held
+     in a note, but the bfd library makes this into a section for us.
+
+     The addition of (2 * PTR_BYTES) here is because PTR is pointing at the
+     null entry, but the null entry is also included in CONTENTS.  */
+  ptr = ptr + (2 * ptr_bytes) - contents.size ();
+
+  /* If we reached the lower bound then we failed -- bail out.  */
+  if (ptr <= low)
+    return {};
+
+  /* PTR should now be pointing to the start of the auxv table mapped into
+     the inferior memory.  As we got here using a heuristic then lets
+     compare an auxv table sized block of inferior memory, if this matches
+     then it's not a guarantee that we are in the right place, but it does
+     make it more likely.  */
+  gdb::byte_vector target_contents (size);
+  if (target_read_memory (ptr, target_contents.data (), size) != 0)
+    memory_error (TARGET_XFER_E_IO, ptr);
+  if (memcmp (contents.data (), target_contents.data (), size) != 0)
+    return {};
+
+  /* We have reasonable confidence that PTR points to the start of the auxv
+     table.  Below this should be the null terminated list of pointers to
+     environment strings, and below that the null terminated list of
+     pointers to arguments strings.  After that we should find the
+     argument count.  First, check for the null at the end of the
+     environment list.  */
+  if (deref (ptr - ptr_bytes) != 0)
+    return {};
+
+  ptr -= (2 * ptr_bytes);
+  while (ptr > low && deref (ptr) != 0)
+    ptr -= ptr_bytes;
+
+  /* If we reached the lower bound then we failed -- bail out.  */
+  if (ptr <= low)
+    return {};
+
+  /* PTR is now pointing to the null entry at the end of the argument
+     string pointer list.  We now want to scan backward to find the entire
+     argument list.  There's no handy null marker that we can look for
+     here, instead, as we scan backward we look for the argument count
+     (argc) value which appears immediately before the argument list.
+
+     Technically, we could have zero arguments, so the argument count would
+     be zero, however, we don't support this case.  If we find a null entry
+     in the argument list before we find the argument count then we just
+     bail out.
+
+     Start by moving to the last argument string pointer, we expect this
+     to be non-null.  */
+  ptr -= ptr_bytes;
+  uint64_t argc = 0;
+  while (ptr > low)
+    {
+      uint64_t val = deref (ptr);
+      if (val == 0)
+	return {};
+
+      if (val == argc)
+	break;
+
+      /* For GNU/Linux on ARM, glibc removes argc from the stack and
+	 replaces it with the "stack-limit".  This actually means a pointer
+	 to the first argument string.  This is unfortunate, but we can
+	 still detect this case.  */
+      if (val == (ptr + ptr_bytes))
+	break;
+
+      argc++;
+      ptr -= ptr_bytes;
+    }
+
+  /* If we reached the lower bound then we failed -- bail out.  */
+  if (ptr <= low)
+    return {};
+
+  /* PTR is now pointing at the argument count value (or where the argument
+     count should be, see notes on ARM above).  Move it forward so we're
+     pointing at the first actual argument string pointer.  */
+  ptr += ptr_bytes;
+
+  /* We can now parse all of the argument strings.  */
+  std::vector<gdb::unique_xmalloc_ptr<char>> arguments;
+
+  /* Skip the first argument.  This is the executable command, but we'll
+     load that separately later.  */
+  ptr += ptr_bytes;
+
+  uint64_t v;
+  while ((v = deref (ptr)) != 0)
+    {
+      gdb::unique_xmalloc_ptr<char> str = target_read_string (v, INT_MAX);
+      if (str == nullptr)
+	return {};
+      arguments.emplace_back (std::move (str));
+      ptr += ptr_bytes;
+    }
+
+  /* Skip the null-pointer at the end of the argument list.  We will now
+     be pointing at the first environment string.  */
+  ptr += ptr_bytes;
+
+  /* Parse the environment strings.  Nothing is done with this yet, but
+     will be in a later commit.  */
+  std::vector<gdb::unique_xmalloc_ptr<char>> environment;
+  while ((v = deref (ptr)) != 0)
+    {
+      gdb::unique_xmalloc_ptr<char> str = target_read_string (v, INT_MAX);
+      if (str == nullptr)
+	return {};
+      environment.emplace_back (std::move (str));
+      ptr += ptr_bytes;
+    }
+
+  gdb::unique_xmalloc_ptr<char> execfn
+    = target_read_string (execfn_string_addr, INT_MAX);
+  if (execfn == nullptr)
+    return {};
+
+  return core_file_exec_context (std::move (execfn),
+				 std::move (arguments));
+}
+
+/* Parse and return execution context details from core file CBFD.  */
+
+static core_file_exec_context
+linux_corefile_parse_exec_context (struct gdbarch *gdbarch, bfd *cbfd)
+{
+  /* Catch and discard memory errors.
+
+     If the core file format is not as we expect then we can easily trigger
+     a memory error while parsing the core file.  We don't want this to
+     prevent the user from opening the core file; the information provided
+     by this function is helpful, but not critical, debugging can continue
+     without it.  Instead just give a warning and return an empty context
+     object.  */
+  try
+    {
+      return linux_corefile_parse_exec_context_1 (gdbarch, cbfd);
+    }
+  catch (const gdb_exception_error &ex)
+    {
+      if (ex.error == MEMORY_ERROR)
+	{
+	  warning
+	    (_("failed to parse execution context from corefile: %s"),
+	     ex.message->c_str ());
+	  return {};
+	}
+      else
+	throw;
+    }
+}
+
 /* Fill the PRPSINFO structure with information about the process being
    debugged.  Returns 1 in case of success, 0 for failures.  Please note that
    even if the structure cannot be entirely filled (e.g., GDB was unable to
@@ -2785,6 +3076,8 @@ linux_init_abi (struct gdbarch_info info, struct gdbarch *gdbarch,
   set_gdbarch_infcall_mmap (gdbarch, linux_infcall_mmap);
   set_gdbarch_infcall_munmap (gdbarch, linux_infcall_munmap);
   set_gdbarch_get_siginfo_type (gdbarch, linux_get_siginfo_type);
+  set_gdbarch_core_parse_exec_context (gdbarch,
+				       linux_corefile_parse_exec_context);
 }
 
 void _initialize_linux_tdep ();
author	Andrew Burgess <aburgess@redhat.com>	2024-04-25 09:36:43 +0100
committer	Andrew Burgess <aburgess@redhat.com>	2024-12-24 14:15:24 +0000
commit	d3d13bf876aae425ae0eff2ab0f1af9f7da0264a (patch)
tree	96b4683cb2a1974d84650f798c58552e0996bb91 /gdb/linux-tdep.c
parent	1eb397a6d20b312df11e787533f32d2312ced215 (diff)
download	fsf-binutils-gdb-d3d13bf876aae425ae0eff2ab0f1af9f7da0264a.zip fsf-binutils-gdb-d3d13bf876aae425ae0eff2ab0f1af9f7da0264a.tar.gz fsf-binutils-gdb-d3d13bf876aae425ae0eff2ab0f1af9f7da0264a.tar.bz2