aboutsummaryrefslogtreecommitdiff
path: root/gdb/testsuite/lib/future.exp
diff options
context:
space:
mode:
authorSergio Durigan Junior <sergiodj@redhat.com>2019-04-23 18:17:57 -0400
committerSergio Durigan Junior <sergiodj@redhat.com>2019-04-25 14:21:18 -0400
commit57e5e645010430b3d73f8c6a757d09f48dc8f8d5 (patch)
tree60c3a8f2dd47acc450b5965d8d02be0ae989f2cf /gdb/testsuite/lib/future.exp
parentdd06d4d6889ee58b76255b775f52ba8e475a7a2d (diff)
downloadgdb-57e5e645010430b3d73f8c6a757d09f48dc8f8d5.zip
gdb-57e5e645010430b3d73f8c6a757d09f48dc8f8d5.tar.gz
gdb-57e5e645010430b3d73f8c6a757d09f48dc8f8d5.tar.bz2
Implement dump of mappings with ELF headers by gcore
This patch has a long story, but it all started back in 2015, with commit df8411da087dc05481926f4c4a82deabc5bc3859 ("Implement support for checking /proc/PID/coredump_filter"). The purpose of that commit was to bring GDB's corefile generation closer to what the Linux kernel does. However, back then, I did not implement the full support for the dumping of memory mappings containing ELF headers (like mappings of DSOs or executables). These mappings were being dumped most of time, though, because the default value of /proc/PID/coredump_filter is 0x33, which would cause anonymous private mappings (DSOs/executable code mappings have this type) to be dumped. Well, until something happened on binutils... A while ago, I noticed something strange was happening with one of our local testcases on Fedora GDB: it was failing due to some strange build-id problem. On Fedora GDB, we (unfortunately) carry a bunch of "local" patches, and some of these patches actually extend upstream's build-id support in order to generate more useful information for the user of a Fedora system (for example, when the user loads a corefile into GDB, we detect whether the executable that generated that corefile is present, and if it's not we issue a warning suggesting that it should be installed, while also providing the build-id of the executable). A while ago, Fedora GDB stopped printing those warnings. I wanted to investigate this right away, and spent some time trying to determine what was going on, but other things happened and I got sidetracked. Meanwhile, the bug started to be noticed by some of our users, and its priority started changing. Then, someone on IRC also mentioned the problem, and when I tried helping him, I noticed he wasn't running Fedora. Hm... So maybe the bug was *also* present upstream. After "some" time investigating, and with a lot of help from Keith and others, I was finally able to determine that yes, the bug is also present upstream, and that even though it started with a change in ld, it is indeed a GDB issue. So, as I said, the problem started with binutils, more specifically after the following commit was pushed: commit f6aec96dce1ddbd8961a3aa8a2925db2021719bb Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Feb 27 11:34:20 2018 -0800 ld: Add --enable-separate-code This commit makes ld use "-z separate-code" by default on x86-64 machines. What this means is that code pages and data pages are now separated in the binary, which is confusing GDB when it tries to decide what to dump. BTW, Fedora 28 binutils doesn't have this code, which means that Fedora 28 GDB doesn't have the problem. From Fedora 29 on, binutils was rebased and incorporated the commit above, which started causing Fedora GDB to fail. Anyway, the first thing I tried was to pass "-z max-page-size" and specify a bigger page size (I saw a patch that did this and was proposed to Linux, so I thought it might help). Obviously, this didn't work, because the real "problem" is that ld will always use separate pages for code and data. So I decided to look into how GDB dumped the pages, and that's where I found the real issue. What happens is that, because of "-z separate-code", the first two pages of the ELF binary are (from /proc/PID/smaps): 00400000-00401000 r--p 00000000 fc:01 799548 /file Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 4 kB Private_Dirty: 0 kB Referenced: 4 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd mr mw me dw sd 00401000-00402000 r-xp 00001000 fc:01 799548 /file Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw sd Whereas before, we had only one: 00400000-00401000 r-xp 00000000 fc:01 798593 /file Size: 4 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd ex mr mw me dw sd Notice how we have "Anonymous" data mapped into the page. This will be important. So, the way GDB decides which pages it should dump has been revamped by my patch in 2015, and now it takes the contents of /proc/PID/coredump_filter into account. The default value for Linux is 0x33, which means: Dump anonymous private, anonymous shared, ELF headers and HugeTLB private pages. Or: filter_flags filterflags = (COREFILTER_ANON_PRIVATE | COREFILTER_ANON_SHARED | COREFILTER_ELF_HEADERS | COREFILTER_HUGETLB_PRIVATE); Now, it is important to keep in mind that GDB doesn't always have *all* of the necessary information to exactly determine the type of a page, so the whole algorithm is based on heuristics (you can take a look at linux-tdep.c:dump_mapping_p and linux-tdep.c:linux_find_memory_regions_full for more info). Before the patch to make ld use "-z separate-code", the (single) page containing data and code was being flagged as an anonymous (due to the non-zero "Anonymous:" field) private (due to the "r-xp" permission), which means that it was being dumped into the corefile. That's why it was working fine. Now, as you can imagine, when "-z separate-code" is used, the *data* page (which is where the ELF notes are, including the build-id one) now doesn't have any "Anonymous:" mapping, so the heuristic is flagging it as file-backed private, which is *not* dumped by default. The next question I had to answer was: how come a corefile generated by the Linux kernel was correct? Well, the answer is that GDB, unlike Linux, doesn't actually implement the COREFILTER_ELF_HEADERS support. On Linux, even though the data page is also treated as a file-backed private mapping, it is also checked to see if there are any ELF headers in the page, and then, because we *do* have ELF headers there, it is dumped. So, after more time trying to think of ways to fix this, I was able to implement an algorithm that reads the first few bytes of the memory mapping being processed, and checks to see if the ELF magic code is present. This is basically what Linux does as well, except that, if it finds the ELF magic code, it just dumps one page to the corefile, whereas GDB will dump the whole mapping. But I don't think that's a big issue, to be honest. It's also important to explain that we *only* perform the ELF magic code check if: - The algorithm has decided *not* to dump the mapping so far, and; - The mapping is private, and; - The mapping's offset is zero, and; - The user has requested us to dump mappings with ELF headers. IOW, we're not going to blindly check every mapping. As for the testcase, I struggled even more trying to write it. Since our build-id support on upstream GDB is not very extensive, it's not really possible to determine whether a corefile contains build-id information or not just by using GDB. So, after thinking a lot about the problem, I decided to rely on an external tool, eu-unstrip, in order to verify whether the dump was successful. I verified the test here on my machine, and everything seems to work as expected (i.e., it fails without the patch, and works with the patch applied). We are working hard to upstream our "local" Fedora GDB patches, and we intend to submit our build-id extension patches "soon", so hopefully we'll be able to use GDB itself to perform this verification. I built and regtested this on the BuildBot, and no problems were found. gdb/ChangeLog: 2019-04-25 Sergio Durigan Junior <sergiodj@redhat.com> PR corefiles/11608 PR corefiles/18187 * linux-tdep.c (dump_mapping_p): Add new parameters ADDR and OFFSET. Verify if current mapping contains an ELF header. (linux_find_memory_regions_full): Adjust call to dump_mapping_p. gdb/testsuite/ChangeLog: 2019-04-25 Sergio Durigan Junior <sergiodj@redhat.com> PR corefiles/11608 PR corefiles/18187 * gdb.base/coredump-filter-build-id.exp: New file.
Diffstat (limited to 'gdb/testsuite/lib/future.exp')
-rw-r--r--gdb/testsuite/lib/future.exp10
1 files changed, 10 insertions, 0 deletions
diff --git a/gdb/testsuite/lib/future.exp b/gdb/testsuite/lib/future.exp
index a56cd01..122e652 100644
--- a/gdb/testsuite/lib/future.exp
+++ b/gdb/testsuite/lib/future.exp
@@ -162,6 +162,16 @@ proc gdb_find_readelf {} {
return $readelf
}
+proc gdb_find_eu-unstrip {} {
+ global EU_UNSTRIP_FOR_TARGET
+ if [info exists EU_UNSTRIP_FOR_TARGET] {
+ set eu_unstrip $EU_UNSTRIP_FOR_TARGET
+ } else {
+ set eu_unstrip [transform eu-unstrip]
+ }
+ return $eu_unstrip
+}
+
proc gdb_default_target_compile {source destfile type options} {
global target_triplet
global tool_root_dir