aboutsummaryrefslogtreecommitdiff
path: root/lldb/source/Plugins/ObjectFile
AgeCommit message (Collapse)AuthorFilesLines
6 days[lldb][NFC] Module, ModuleSpec, GetSectionData use DataExtractorSP (#178347)Jason Molenda20-250/+286
In a PR last month I changed the ObjectFile CreateInstance etc methods to accept an optional DataExtractorSP instead of a DataBufferSP, and retain the extractor in a shared pointer internally in all of the ObjectFile subclasses. This is laying the groundwork for using a VirtualDataExtractor for some Mach-O binaries on macOS, where the segments of the binary are out-of-order in actual memory, and we add a lookup table to make it appear that the TEXT segment is at offset 0 in the Extractor, etc. Working on the actual implementation, I realized we were still using DataBufferSP's in ModuleSpec and Module, as well as in ObjectFile::GetModuleSpecifications. I originally was making a much larger NFC change where I had all ObjectFile subclasses operating on DataExtractors throughout their implementation, as well as in the DWARF parser. It was a very large patchset. Many subclasses start with their DataExtractor, then create smaller DataExtractors for parts of the binary image - the string table, the symbol table, etc., for processing. After consideration and discussion with Jonas, we agreed that a segment/section of a binary will never require a lookup table to access the bytes within it, so I changed VirtualDataExtractor::GetSubsetExtractorSP to (1) require that the Subset be contained within a single lookup table entry, and (2) return a simple DataExtractor bounded on that byte range. By doing this, I was able to remove all of my very-invasive changes to the ObjectFile subclass internals; it's only when they are operating on the entire binary image that care is needed. One pattern that subclasses like ObjectFileBreakpad use is to take an ArrayRef of the DataBuffer for a binary, then create a StringRef of that, then look for strings in it. With a VirtualDataExtractor and out-of-order binary segments, with gaps between them, this allows us to search the entire buffer looking for a string, and segfault when it gets to an unmapped region of the buffer. I added a VirtualDataExtractor::GetSubsetExtractorSP(0) which gets the largest contiguous memory region starting at offset 0 for this use case, and I added a comment about what was being done there because I know it is not obvious, and people not working on macOS wouldn't be familiar with the requirement. (when we have a ModuleSpec with a DataExtractor, any of the ObjectFile subclasses get a shot at Creating, so they all have to be able to iterate on these) rdar://148939795
8 days[lldb][NFC] Mark Symbol pointers as const where easily possible (#177472)Alex Langford2-8/+12
These are the places that required no modifications to surrounding code.
8 days[lldb] Fix unchecked llvm::Expected in ObjectFileWasm (#178299)Jonas Devlieghere1-4/+10
Don't discard the llvm::Error when we fail to parse the module or field name.
12 days[lldb] Fix data buffer regression in ObjectFile (#177724)Jonas Devlieghere1-2/+1
This fixes a regression in `ObjectFile` and `ObjectFileELF` introduced by #171574. The original code created a `DataBuffer` using `MapFileDataWritable`. ``` data_sp = MapFileDataWritable(*file, length, file_offset); if (!data_sp) return nullptr; data_offset = 0; ``` The new code requires converting the `DataBuffer` to a `DataExtractor`: ``` DataBufferSP buffer_sp = MapFileDataWritable(*file, length, file_offset); if (!buffer_sp) return nullptr; extractor_sp = std::make_shared<DataExtractor>(); extractor_sp->SetData(buffer_sp, data_offset, buffer_sp->GetByteSize()); data_offset = 0; ``` The issue is that once we get a data buffer back from MapFileDataWritable, we don't have to adjust for the `data_offset` again when calling `SetData` as the `DataBuffer` is already normalized to have a zero start offset. A similar issue exists in `ObjectFile`. rdar://168317174
12 days[lldb] Avoid redundant calls to `std::shared_ptr::get` (NFC) (#177720)Jonas Devlieghere3-31/+24
Avoid redundant calls to `std::shared_ptr::get()`. The class provides a dereference operator and using that is the standard, idiomatic way to access the underlying object.
2026-01-13Fix typos and spelling errors across codebase (#156270)Austin Jiang1-5/+5
Corrected various spelling mistakes such as 'occurred', 'receiver', 'initialized', 'length', and others in comments, variable names, function names, and documentation throughout the project. These changes improve code readability and maintain consistency in naming and documentation. Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
2026-01-02[LLVM][ADT] Migrate users of `make_scope_exit` to CTAD (#174030)Victor Chernyakin1-1/+1
This is a followup to #173131, which introduced the CTAD functionality.
2025-12-11[lldb][NFC] Change ObjectFile argument type (#171574)Jason Molenda20-166/+244
The ObjectFile plugin interface accepts an optional DataBufferSP argument. If the caller has the contents of the binary, it can provide this in that DataBufferSP. The ObjectFile subclasses in their CreateInstance methods will fill in the DataBufferSP with the actual binary contents if it is not set. ObjectFile base class creates an ivar DataExtractor from the DataBufferSP passed in. My next patch will be a caller that creates a VirtualDataExtractor with the binary data, and needs to pass that in to the ObjectFile plugin, instead of the bag-of-bytes DataBufferSP. It builds on the previous patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP so I could pass in a subclass in the shared ptr. And it will be using the VirtualDataExtractor that Jonas added in https://github.com/llvm/llvm-project/pull/168802 No behavior is changed by the patch; we're simply moving the creation of the DataExtractor to the caller, instead of a DataBuffer that is immediately used to set up the ObjectFile DataExtractor. The patch is a bit complicated because all of the ObjectFile subclasses have to initialize their DataExtractor to pass in to the base class. I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the SME, that explains why the testsuite passes) rdar://148939795 --------- Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
2025-12-09[lldb][Wasm] Handle imports when parsing Wasm name sections (#170960)Derek Schuff2-10/+75
LLDB can use the wasm name section to populate its symbol table and get names for functions. However the index space used in the name section is the "function index space" which includes imported as well as locally defined functions.
2025-12-01[lldb][NFC] Change ObjectFile's DataExtractor to a shared ptr (#170066)Jason Molenda7-184/+203
ObjectFile has an m_data DataExtractor ivar which may be default constructed initially, or initialized with a DataBuffer passed in to its ctor. If the DataExtractor does not get a DataBuffer source passed in, the subclass will initialize it with access to the object file's data. When a DataBuffer is passed in to the base class ctor, the DataExtractor only has its buffer initialized; ObjectFile doesn't yet know the address size and endianness to fully initialize the DataExtractor. This patch changes ObjectFile to instead have a DataExtractorSP ivar which is always initialized with at least a default-constructed DataExtractor object in the base class ctor. The next patch I will be writing is to change the ObjectFile ctor to take an optional DataExtractorSP, so the caller can pass a DataExtractor subclass -- the VirtualizeDataExtractor being added via https://github.com/llvm/llvm-project/pull/168802 instead of a DataBuffer which is trivially saved into the DataExtractor. The change is otherwise mechanical; all `m_data.` changed to `m_data_sp->` and all the places where `m_data` was passed in for a by-ref call were changed to `*m_data_sp.get()`. The shared pointer is always initialized to contain an object. I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for getting the Linux testsuite to run on SME-only systems David). All of the ObjectFile subclasses I modifed compile cleanly, but I haven't tested them beyond any unit tests they may have (prob breakpad). rdar://148939795
2025-11-06[LLDB] Fix debuginfo ELF files overwriting Unified Section List (#166635)Jacob Lalonde1-4/+27
Recently I've been deep diving ELF cores in LLDB, aspiring to move LLDB closer to GDB in capability. One issue I encountered was a system lib losing it's unwind plan when loading the debuginfo. The reason for this was the debuginfo has the eh_frame section stripped and the main executable did not. The root cause of this was this line in [ObjectFileElf](https://github.com/llvm/llvm-project/blob/163933e9e7099f352ff8df1973f9a9c3d7def6c5/lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp#L1972) ``` // For eTypeDebugInfo files, the Symbol Vendor will take care of updating the // unified section list. if (GetType() != eTypeDebugInfo) unified_section_list = *m_sections_up; ``` This would always be executed because CalculateType can never return an eTypeDebugInfo ``` ObjectFile::Type ObjectFileELF::CalculateType() { switch (m_header.e_type) { case llvm::ELF::ET_NONE: // 0 - No file type return eTypeUnknown; case llvm::ELF::ET_REL: // 1 - Relocatable file return eTypeObjectFile; case llvm::ELF::ET_EXEC: // 2 - Executable file return eTypeExecutable; case llvm::ELF::ET_DYN: // 3 - Shared object file return eTypeSharedLibrary; case ET_CORE: // 4 - Core file return eTypeCoreFile; default: break; } return eTypeUnknown; } ``` This makes sense as there isn't a explicit sh_type to denote that this file is a debuginfo. After some discussion with @clayborg and @GeorgeHuyubo we settled on joining the exciting unified section list with whatever new sections were being added. Adding each new unique section, or taking the section with the maximum file size. We picked this strategy to pick the section with the most information. In most scenarios, LHS should be SHT_NOBITS and RHS would be SHT_PROGBITS. Here is a diagram documenting the existing vs proposed new way. <img width="1666" height="1093" alt="image" src="https://github.com/user-attachments/assets/73ba9620-c737-439e-9934-ac350d88a3b5" />
2025-11-06[lldb] Add function to tell if a section is a GOT section (#165936)Augusto Noronha2-0/+16
A global offset table is a section that holds the address of functions that are dynamically linked. The Swift plugin needs to know if sections are a global offset table or not.
2025-11-03[lldb] Fix unaligned writes in ObjectFileELF (#165759)Alex Langford1-6/+4
The code to apply relocations was sometimes creating unaligned destination pointers. Instead of giving them an explicit type (i.e. `uint64_t *`) and forcing the compiler to generate unaligned stores, mark the pointer as `void *`. The compiler will figure out the correct series of store instructions.
2025-11-02[ADT] Prepare to deprecate variadic `StringSwitch::Cases`. NFC. (#166020)Jakub Kuderski3-4/+4
Update all uses of variadic `.Cases` to use the initializer list overload instead. I plan to mark variadic `.Cases` as deprecated in a followup PR. For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-10-30Enable LLDB to load large dSYM files. (#164471)Greg Clayton1-8/+22
llvm-dsymutil can produce mach-o files where some sections in __DWARF exceed the 4GB barrier and subsequent sections in the dSYM will be inaccessible because the mach-o section_64 structure only has a 32 bit file offset. This patch enables LLDB to load a large dSYM file by figuring out when this happens and properly adjusting the file offset of the LLDB sections. I was unable to add a test as obj2yaml and yaml2obj are broken for mach-o files and they can't convert a yaml file back into a valid mach-o object file. Any suggestions for adding a test would be appreciated.
2025-10-02Reland "[lldb][MachO][NFC] Extract ObjC metadata symbol parsing into helper ↵Michael Buch1-115/+51
function" (#161655) This reverts `5a80fb9177e3c831c9c574400a13d77393397f2a`. The original change got reverted because of failing tests on macOS. The issue was that I changed the scope of setting `type = eSymbolTypeData` during the cleanup. This patch relands the original patch but doesn't change the `else` branch to an `else if` branch. Tested that macOS test-suite passes.
2025-10-01Revert "[lldb][MachO][NFC] Extract ObjC metadata symbol parsing into helper ↵Augusto Noronha1-54/+124
function (#161536)" This reverts commit 23e081524fd9f64fb3430822e879b6dc36a1d3f1.
2025-10-01[lldb][MachO][NFC] Extract ObjC metadata symbol parsing into helper function ↵Michael Buch1-124/+54
(#161536) Just a simple de-duplication of the same code. We saw a bug here recently (https://github.com/llvm/llvm-project/pull/161521). Might as well isolate this all in one place. rdar://158159242
2025-10-01[lldb][MachO] Fix inspection of global variables that start with 'O' (#161521)Michael Buch1-48/+42
On Darwin C-symbols are prefixed with a '_'. The LLDB Macho-O parses handles Objective-C metadata symbols starting with '_OBJC' specially. Previously global symbols starting with a '_O' prefix were lost because of incorrectly scoped if-guards. This patch removes those checks. There is more cleanup that can be done in this file because there's a bunch of duplicated checks for these ObjC symbols. I decided to leave that for an NFC follow-up. Depends on https://github.com/llvm/llvm-project/pull/161520 rdar://158159242
2025-09-25Modify ObjectFileELF so it can load notes from PT_NOTE segments. (#160652)Greg Clayton1-0/+18
The ObjectFileELF parser was not able to load ELF notes from PT_NOTE program headers. This patch fixes ObjectFileELF::GetUUID() to check the program header and parse the notes in any PT_NOTE segments. This will allow memory ELF files to extract the UUID from an in memory image that has no section headers. Added a test that creates an ELF file, strips all section headers, and then makes sure that LLDB can see the UUID value.
2025-09-19[lldb][MachO] Local structs for larger VA offsets (#159849)Jason Molenda2-43/+171
The Mach-O file format has several load commands which specify the location of data in the file in UInt32 offsets. lldb uses these same structures to track the offsets of the binary in virtual address space when it is running. Normally a binary is loaded in memory contiguously, so this is fine, but on Darwin systems there is a "system shared cache" where all system libraries are combined into one region of memory and pre-linked. The shared cache has the TEXT segments for every binary loaded contiguously, then the DATA segments, and finally a shared common LINKEDIT segment for all binaries. The virtual address offset from the TEXT segment for a libray to the LINKEDIT may exceed 4GB of virtual address space depending on the structure of the shared cache, so this use of a UInt32 offset will not work. There was an initial instance of this issue that I fixed last November in https://github.com/llvm/llvm-project/pull/117832 where I fixed this issue for the LC_SYMTAB / `symtab_command` structure. But we have the same issue now with three additional structures; `linkedit_data_command`, `dyld_info_command`, and `dysymtab_command`. For all of these we can see the pattern of `dyld_info.export_off += linkedit_slide` applied to the offset fields in ObjectFileMachO. This defines local structures that mirror the Mach-O structures, except that it uses UInt64 offset fields so we can reuse the same field for a large virtual address offset at runtime. I defined ctor's from the genuine structures, as well as operator= methods so the structures can be read from the Mach-O binary into the standard object, then copied into our local expanded versions of them. These structures are ABI in Mach-O and cannot change their layout. The alternative is to create local variables alongside these Mach-O load command objects for the offsets that we care about, adjust those by the correct VA offsets, and only use those local variables instead of the fields in the objects. I took the approach of the local enhanced structure in November and I think it is the cleaner approach. rdar://160384968
2025-09-09[lldb] Unwind through ARM Cortex-M exceptions automatically (#153922)Jason Molenda2-0/+38
When a processor faults/is interrupted/gets an exception, it will stop running code and jump to an exception catcher routine. Most processors will store the pc that was executing in a system register, and the catcher functions have special instructions to retrieve that & possibly other registers. It may then save those values to stack, and the author can add .cfi directives to tell lldb's unwinder where to find those saved values. ARM Cortex-M (microcontroller) processors have a simpler mechanism where a fixed set of registers are saved to the stack on an exception, and a unique value is put in the link register to indicate to the caller that this has taken place. No special handling needs to be written into the exception catcher, unless it wants to inspect these preserved values. And it is possible for a general stack walker to walk the stack with no special knowledge about what the catch function does. This patch adds an Architecture plugin method to allow an Architecture to override/augment the UnwindPlan that lldb would use for a stack frame, given the contents of the return address register. It resembles a feature where the LanguageRuntime can replace/augment the unwind plan for a function, but it is doing it at offset by one level. The LanguageRuntime is looking at the local register context and/or symbol name to decide if it will override the unwind rules. For the Cortex-M exception unwinds, we need to modify THIS frame's unwind plan if the CALLER's LR had a specific value. RegisterContextUnwind has to retrieve the caller's LR value before it has completely decided on the UnwindPlan it will use for THIS stack frame. This does mean that we will need one additional read of stack memory than we currently do when unwinding, on Armv7 Cortex-M targets. The unwinder walks the stack lazily, as stack frames are requested, and so now if you ask for 2 stack frames, we will read enough stack to walk 2 frames, plus we will read one extra word of memory, the spilled LR value from the stack. In practice, with 512-byte memory cache reads, this is unlikely to be a real performance hit. This PR includes a test with a yaml corefile description and a JSON ObjectFile, incorporating all of the necessary stack memory and symbol names from a real debug session I worked on. The architectural default unwind plans are used for all stack frames except the 0th because there's no instructions for the functions, and no unwind info. I may need to add an encoding of unwind fules to ObjectFileJSON in the future as we create more test cases like this. This PR depends on the yaml2macho-core utility from https://github.com/llvm/llvm-project/pull/153911 to run its API test. rdar://110663219
2025-08-27[LLDB] Omit loading local symbols in LLDB symbol table (#154809)barsolo20001-1/+40
https://discourse.llvm.org/t/rfc-should-we-omit-local-symbols-in-eekciihgtfvflvnbieicunjlrtnufhuelf-files-from-the-lldb-symbol-table/87384 Improving symbolication by excluding local symbols that are typically not useful for debugging or symbol lookups. This aligns with the discussion that local symbols, especially those with STB_LOCAL binding and STT_NOTYPE type (including .L-prefixed symbols), often interfere with symbol resolution and can be safely omitted. --------- Co-authored-by: Bar Soloveychik <barsolo@fb.com>
2025-08-26[lldb] Corretly parse Wasm segments (#154727)Jonas Devlieghere2-128/+214
My original implementation for parsing Wasm segments was wrong in two related ways. I had a bug in calculating the file vm address and I didn't fully understand the difference between active and passive segments and how that impacted their file vm address. With this PR, we now support parsing init expressions for active segments, rather than just skipping over them. This is necessary to determine where they get loaded. Similar to llvm-objdump, we currently only support simple opcodes (i.e. constants). We also currently do not support active segments that use a non-zero memory index. However this covers all segments for a non-trivial Swift binary compiled to Wasm.
2025-08-26[lldb] Do not use LC_FUNCTION_STARTS data to determine symbol size as ↵Alex Langford1-122/+0
symbols are created (#155282) Note: This is a resubmission of #106791. I had to revert this a year ago for a failing test that I could not understand. I have time now to try and get this in again. Summary: This improves the performance of ObjectFileMacho::ParseSymtab by removing eager and expensive work in favor of doing it later in a less-expensive fashion. Experiment: My goal was to understand LLDB's startup time. First, I produced a Debug build of LLDB (no dSYM) and a Release+NoAsserts build of LLDB. The Release build debugged the Debug build as it debugged a small C++ program. I found that ObjectFileMachO::ParseSymtab accounted for somewhere between 1.2 and 1.3 seconds consistently. After applying this change, I consistently measured a reduction of approximately 100ms, putting the time closer to 1.1s and 1.2s on average. Background: ObjectFileMachO::ParseSymtab will incrementally create symbols by parsing nlist entries from the symtab section of a MachO binary. As it does this, it eagerly tries to determine the size of symbols (e.g. how long a function is) using LC_FUNCTION_STARTS data (or eh_frame if LC_FUNCTION_STARTS is unavailable). Concretely, this is done by performing a binary search on the function starts array and calculating the distance to the next function or the end of the section (whichever is smaller). However, this work is unnecessary for 2 reasons: 1. If you have debug symbol entries (i.e. STABs), the size of a function is usually stored right after the function's entry. Performing this work right before parsing the next entry is unnecessary work. 2. Calculating symbol sizes for symbols of size 0 is already performed in `Symtab::InitAddressIndexes` after all the symbols are added to the Symtab. It also does this more efficiently by walking over a list of symbols sorted by address, so the work to calculate the size per symbol is constant instead of O(log n).
2025-08-19[lldb] Improve error handling in ObjectFileWasm (#154433)Jonas Devlieghere1-77/+102
Improve error handling in ObjectFileWasm by using helpers that wrap their result in an llvm::Expected. The helper to read a Wasm string now return an Expected<std::string> and I created a helper to parse 32-bit ULEBs that returns an Expected<uint32_t>.
2025-08-19[lldb] Create sections for Wasm segments (#153634)Jonas Devlieghere1-19/+65
This is a continuation of #153494. In a WebAssembly file, the "name" section contains names for the segments in the data section (WASM_NAMES_DATA_SEGMENT). We already parse these as symbols, and with this PR, we now also create sub-sections for each of the segments.
2025-08-14[lldb] Support parsing data symbols from the Wasm name section (#153494)Jonas Devlieghere1-22/+92
This PR adds support for parsing the data symbols from the WebAssembly name section, which consists of a name and address range for the segments in the Wasm data section. Unlike other object file formats, Wasm has no symbols for referencing items within those segments (i.e. symbols the user has defined).
2025-08-13[lldb] Use numeric_limits for all overflow checks in ObjectFileWasm (#153332)Jonas Devlieghere1-6/+6
Use std::numeric_limits<uint32_t>::max() for all overflow checks in ObjectFileWasm and fix a few locations where I incorrectly used `>=` instead of `>`.
2025-08-12[lldb] Support parsing the Wasm symbol table (#153093)Jonas Devlieghere2-10/+136
This PR adds support for parsing the WebAssembly symbol table. The symbol table is encoded in the "names" section and contains names and indexes into other sections. For now we only support parsing function (code) symbols. The result is that you can set breakpoints by symbol name, while previously breakpoints by name required debug info (DWARF). This is also necessary for Swift, which checks for the presence of `swift_release` as a heuristic to determine if there's a static Swift stdlib.
2025-08-11[Minidump] Update Minidump file builder to continue when the Module's ↵barsolo20001-22/+31
section cannot be found (#152009) Instead of returning an error when: - it can't obtain section information from a module. - there are other issues calculating the size. when we encounter such an error we log the error and continue with the other modules. tested with lldb/test/API/functionalities/process_save_core_minidump/TestProcessSaveCoreMinidump.py --------- Co-authored-by: Bar Soloveychik <barsolo@fb.com>
2025-07-30[lldb] Support DW_OP_WASM_location in DWARFExpression (#151010)Jonas Devlieghere1-1/+5
Add support for DW_OP_WASM_location in DWARFExpression. This PR rebases #78977 and cleans up the unit test. The DWARF extensions are documented at https://yurydelendik.github.io/webassembly-dwarf/ and supported by LLVM-based toolchains such as Clang, Swift, Emscripten, and Rust.
2025-07-24[lldb] Fix uninitialized memory access. (#150544)Jorge Gorbe Moya1-3/+3
lldb/test/API/functionalities/process_save_core_minidump/TestProcessSaveCoreMinidump64b.py fails under msan with uninitialized memory access errors. The problem is that a few structs are written to the dump without having been fully initialized. This change makes them default-initialized so dumping the fields that aren't explicitly written to won't trigger UB.
2025-07-18[LLDB] Fix Memory64 BaseRVA, move all non-stack memory to Mem64. (#146777)Jacob Lalonde1-39/+39
### Context Over a year ago, I landed support for 64b Memory ranges in Minidump (#95312). In this patch we added the Memory64 list stream, which is effectively a Linked List on disk. The layout is a sixteen byte header and then however many Memory descriptors. ### The Bug This is a classic off-by one error, where I added 8 bytes instead of 16 for the header. This caused the first region to start 8 bytes before the correct RVA, thus shifting all memory reads by 8 bytes. We are correctly writing all the regions to disk correctly, with no physical corruption but the RVA is defined wrong, meaning we were incorrectly reading memory ![image](https://github.com/user-attachments/assets/049ef55d-856c-4f3c-9376-aeaa3fe8c0e1) ### Why wasn't this caught? One problem we've had is forcing Minidump to actually use the 64b mode, it would be a massive waste of resources to have a test that actually wrote >4.2gb of IO to validate the 64b regions, and so almost all validation has been manual. As a weakness of manual testing, this issue is psuedo non-deterministic, as what regions end up in 64b or 32b is handled greedily and iterated in the order it's laid out in /proc/pid/maps. We often validated 64b was written correctly by hexdumping the Minidump itself, which was not corrupted (other than the BaseRVA) ![image](https://github.com/user-attachments/assets/b599e3be-2d59-47e2-8a2d-75f182bb0b1d) ### Why is this showing up now? During internal usage, we had a bug report that the Minidump wasn't displaying values. I was unable to repro the issue, but during my investigation I saw the variables were in the 64b regions which resulted in me identifying the bug. ### How do we prevent future regressions? To prevent regressions, and honestly to save my sanity for figuring out where 8 bytes magically came from, I've added a new API to SBSaveCoreOptions. ```SBSaveCoreOptions::GetMemoryRegionsToSave()``` The ability to get the memory regions that we intend to include in the Coredump. I added this so we can compare what we intended to include versus what was actually included. Traditionally we've always had issues comparing regions because Minidump includes `/proc/pid/maps` and it can be difficult to know what memoryregion read failure was a genuine error or just a page that wasn't meant to be included. We are also leveraging this API to choose the memory regions to be generated, as well as for testing what regions should be bytewise 1:1. After much debate with @clayborg, I've moved all non-stack memory to the Memory64 List. This list doesn't incur us any meaningful overhead and Greg originally suggested doing this in the original 64b PR. This also means we're exercising the 64b path every single time we save a Minidump, preventing regressions on this feature from slipping through testing in the future. Snippet produced by [minidump.py](https://github.com/clayborg/scripts) ``` MINIDUMP_MEMORY_LIST: NumberOfMemoryRanges = 0x00000002 MemoryRanges[0] = [0x00007f61085ff9f0 - 0x00007f6108601000) @ 0x0003f655 MemoryRanges[1] = [0x00007ffe47e50910 - 0x00007ffe47e52000) @ 0x00040c65 MINIDUMP_MEMORY64_LIST: NumberOfMemoryRanges = 0x000000000000002e BaseRva = 0x0000000000042669 MemoryRanges[0] = [0x00005584162d8000 - 0x00005584162d9000) MemoryRanges[1] = [0x00005584162d9000 - 0x00005584162db000) MemoryRanges[2] = [0x00005584162db000 - 0x00005584162dd000) MemoryRanges[3] = [0x00005584162dd000 - 0x00005584162ff000) MemoryRanges[4] = [0x00007f6100000000 - 0x00007f6100021000) MemoryRanges[5] = [0x00007f6108800000 - 0x00007f6108828000) MemoryRanges[6] = [0x00007f6108828000 - 0x00007f610899d000) MemoryRanges[7] = [0x00007f610899d000 - 0x00007f61089f9000) MemoryRanges[8] = [0x00007f61089f9000 - 0x00007f6108a08000) MemoryRanges[9] = [0x00007f6108bf5000 - 0x00007f6108bf7000) ``` ### Misc As a part of this fix I had to look at LLDB logs a lot, you'll notice I added `0x` to many of the PRIx64 `LLDB_LOGF`. This is so the user (or I) can directly copy paste the address in the logs instead of adding the hex prefix themselves. Added some SBSaveCore tests for the new GetMemoryAPI, and Docstrings. CC: @DavidSpickett, @da-viper @labath because we've been working together on save-core plugins, review it optional and I didn't tag you but figured you'd want to know
2025-07-02[lldb] remove do-nothing defaults in case statements,Jason Molenda1-7/+0
unbreak gcc CI bots.
2025-07-02[lldb] Fix warningsKazu Hirata1-0/+3
This patch fixes: lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:415:7: error: label at end of compound statement is a C++23 extension [-Werror,-Wc++23-extensions] lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:536:7: error: label at end of compound statement is a C++23 extension [-Werror,-Wc++23-extensions] lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:672:7: error: label at end of compound statement is a C++23 extension [-Werror,-Wc++23-extensions]
2025-07-02[lldb][NFC][MachO] Clean up LC_THREAD reading code, remove i386 corefile ↵Jason Molenda1-177/+13
(#146480) While fixing bugs in the x86_64 LC_THREAD parser in ObjectFileMachO, I noticed that the other LC_THREAD parsers are all less clear than they should be. To recap, a Mach-O LC_THREAD load command has a byte size for the entire payload. Within the payload, there will be one or more register sets provided. A register set starts with a UInt32 "flavor", the type of register set defined in the system headers, and a UInt32 "count", the number of UInt32 words of memory for this register set. After one register set, there may be additional sets. A parser can skip an unknown register set flavor by using the count field to get to the next register set. When the total byte size of the LC_THREAD load command has been parsed, it is completed. This patch fixes the riscv/arm/arm64 LC_THREAD parsers to use the total byte size as the exit condition, and to skip past unrecognized register sets, instead of stopping parsing. Instead of fixing the i386 corefile support, I removed it. The last macOS that supported 32-bit Intel code was macOS 10.14 in 2018. I also removed i386 KDP support, 32-bit intel kernel debugging hasn't been supported for even longer than that. It would be preferable to do these things separately, but I couldn't bring myself to update the i386 LC_THREAD parser, and it required very few changes to remove this support entirely.
2025-06-30[lldb][Mach-O] Fix several bugs in x86_64 Mach-O corefile (#146460)Jason Molenda1-40/+26
reading, and one bug in the new RegisterContextUnifiedCore class. The PR I landed a few days ago to allow Mach-O corefiles to augment their registers with additional per-thread registers in metadata exposed a few bugs in the x86_64 corefile reader when running under different CI environments. It also showed a bug in my RegisterContextUnifiedCore class where I wasn't properly handling lookups of unknown registers (e.g. the LLDB_GENERIC_RA when debugging an intel target). The Mach-O x86_64 corefile support would say that it had fpu & exc registers available in every corefile, regardless of whether they were actually present. It would only read the bytes for the first register flavor in the LC_THREAD, the GPRs, but it read them incorrectly, so sometimes you got more register context than you'd expect. The LC_THREAD register context specifies a flavor and the number of uint32_t words; the ObjectFileMachO method would read that number of uint64_t's, exceeding the GPR register space, but it was followed by FPU and then EXC register space so it didn't crash. If you had a corefile with GPR and EXC register bytes, it would be written into the GPR and then FPU register areas, with zeroes filling out the rest of the context.
2025-06-27[lldb][Mach-O] Allow "process metadata" LC_NOTE to supply registers (#144627)Jason Molenda2-20/+46
The "process metadata" LC_NOTE allows for thread IDs to be specified in a Mach-O corefile. This extends the JSON recognzied in that LC_NOTE to allow for additional registers to be supplied on a per-thread basis. The registers included in a Mach-O corefile LC_THREAD load command can only be one of the register flavors that the kernel (xnu) defines in <mach/arm/thread_status.h> for arm64 -- the general purpose registers, floating point registers, exception registers. JTAG style corefile producers may have access to many additional registers beyond these that EL0 programs typically use, for instance TCR_EL1 on AArch64, and people developing low level code need access to these registers. This patch defines a format for including these registers for any thread. The JSON in "process metadata" is a dictionary that must have a `threads` key. The value is an array of entries, one per LC_THREAD in the Mach-O corefile. The number of entries must match the LC_THREADs so they can be correctly associated. Each thread's dictionary must have two keys, `sets`, and `registers`. `sets` is an array of register set names. If a register set name matches one from the LC_THREAD core registers, any registers that are defined will be added to that register set. e.g. metadata can add a register to the "General Purpose Registers" set that lldb shows users. `registers` is an array of dictionaries, one per register. Each register must have the keys `name`, `value`, `bitsize`, and `set`. It may provide additional keys like `alt-name`, that `DynamicRegisterInfo::SetRegisterInfo` recognizes. This `sets` + `registers` formatting is the same that is used by the `target.process.python-os-plugin-path` script interface uses, both are parsed by `DynamicRegisterInfo`. The one addition is that in this LC_NOTE metadata, each register must also have a `value` field, with the value provided in big-endian base 10, as usual with JSON. In RegisterContextUnifiedCore, I combine the register sets & registers from the LC_THREAD for a specific thread, and the metadata sets & registers for that thread from the LC_NOTE. Even if no LC_NOTE is present, this class ingests the LC_THREAD register contexts and reformats it to its internal stores before returning itself as the RegisterContex, instead of shortcutting and returning the core's native RegisterContext. I could have gone either way with that, but in the end I decided if the code is correct, we should live on it always. I added a test where we process save-core to create a userland corefile, then use a utility "add-lcnote" to strip the existing "process metadata" LC_NOTE that lldb put in it, and adds a new one from a JSON string. rdar://74358787 --------- Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
2025-06-24Reapply "[lldb/cmake] Plugin layering enforcement mechanism (#144543)" (#145305)Pavel Labath1-0/+2
The only difference from the original PR are the added BRIEF and FULL_DOCS arguments to define_property, which are required for cmake<3.23.
2025-06-23Revert "[lldb/cmake] Plugin layering enforcement mechanism (#144543)"Pavel Labath1-2/+0
Causes failures on several bots. This reverts commits 714b2fdf3a385e5b9a95c435f56b1696ec3ec9e8 and e7c1da7c8ef31c258619c1668062985e7ae83b70.
2025-06-23[lldb/cmake] Plugin layering enforcement mechanism (#144543)Pavel Labath1-0/+2
Some inter-plugin dependencies are okay, others are not. Yet others not, but we're sort of stuck with them. The idea is to be able to prevent backsliding while making sure that acceptable dependencies are.. accepted. For context, see https://github.com/llvm/llvm-project/pull/139170 and the attached changes to the documentation.
2025-06-17[lldb][AIX] Added XCOFF ParseSymtab handling (#141577)Dhruv Srivastava1-1/+101
This PR is in reference to porting LLDB on AIX. Link to discussions on llvm discourse and github: 1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640 2. https://github.com/llvm/llvm-project/issues/101657 The complete changes for porting are present in this draft PR: https://github.com/llvm/llvm-project/pull/102601 **Description:** Adding ParseSymtab logic after creating sections. It is able to handle both 32 and 64 bit symbols, without the need to add template logic. This is an incremental PR on top of my previous couple of XCOFF support commits.
2025-06-09[lldb][Mach-O] Fix DWARF5 debugging regression for Mach-OJason Molenda1-0/+7
A unification of the DWARF section names, https://github.com/llvm/llvm-project/pull/141344 broke dwarf5 debugging with Mach-O files. The str_offset and str_offset.dwo names are different in Mach-O from other object files.
2025-06-09[LLDB] Unify DWARF section name matching (#141344)nerix5-175/+20
Different object file formats support DWARF sections (COFF, ELF, MachO, PE/COFF, WASM). COFF and PE/COFF only matched a subset. This caused some GCC executables produced on MinGW to have issue later on when debugging. One example is that `.debug_rnglists` was not matched, which caused range-extraction to fail when printing a backtrace. This unifies the parsing of section names in `ObjectFile::GetDWARFSectionTypeFromName`, so all file formats can use the same naming convention. Since the prefixes are different, `GetDWARFSectionTypeFromName` only matches the suffixes (i.e. `.debug_` needs to be stripped before). I added two tests to ensure the sections are correctly identified on Windows executables.
2025-06-05Revert "[lldb] Set default object format to `MachO` in `ObjectFileMachO` ↵Jason Molenda1-1/+0
(#142704)" This reverts commit d4d2f069dec4fb8b13447f52752d4ecd08d976d6. Temporarily reverting until we can find a way to get the correct ObjectFile set in Module's Triples without adding "-macho" to the triple string for each Module. This is breaking TestUniversal.py on the x86_64 macOS CI bots.
2025-06-04[lldb] Set default object format to `MachO` in `ObjectFileMachO` (#142704)royitaqi1-0/+1
# The Change This patch sets the **default** object format of `ObjectFileMachO` to be `MachO` (instead of what currently ends up to be `ELF`, see below). This should be **the correct thing to do**, because the code before the line of change has already verified the Mach-O header. The existing logic: * In `ObjectFileMachO`, the object format is unassigned by default. So it's `UnknownObjectFormat` (see [code](https://github.com/llvm/llvm-project/blob/54d544b83141dc0b20727673f68793728ed54793/llvm/lib/TargetParser/Triple.cpp#L1024)). * The code then looks at load commands like `LC_VERSION_MIN_*` ([code](https://github.com/llvm/llvm-project/blob/54d544b83141dc0b20727673f68793728ed54793/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp#L5180-L5217)) and `LC_BUILD_VERSION` ([code](https://github.com/llvm/llvm-project/blob/54d544b83141dc0b20727673f68793728ed54793/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp#L5231-L5252)) and assign the Triple's OS and Environment if they exist. * If the above sets the Triple's OS to macOS, then the object format defaults to `MachO`; otherwise it is `ELF` ([code](https://github.com/llvm/llvm-project/blob/54d544b83141dc0b20727673f68793728ed54793/llvm/lib/TargetParser/Triple.cpp#L936-L937)) # Impact For **production usage** where Mach-O files have the said load commands (which is [expected](https://www.google.com/search?q=Are+mach-o+files+expected+to+have+the+LC_BUILD_VERSION+load+command%3F)), this patch won't change anything. * **Important note**: It's not clear if there are legitimate production use cases where the Mach-O files don't have said load commands. If there is, the exiting code think they are `ELF`. This patch changes it to `MachO`. This is considered a fix for such files. For **unit tests**, this patch will simplify the yaml data by not requiring the said load commands. # Test See PR.
2025-06-04[lldb/cmake] Implicitly pass arguments to llvm_add_library (#142583)Pavel Labath11-34/+34
If we're not touching them, we don't need to do anything special to pass them along -- with one important caveat: due to how cmake arguments work, the implicitly passed arguments need to be specified before arguments that we handle. This isn't particularly nice, but the alternative is enumerating all arguments that can be used by llvm_add_library and the macros it calls (it also relies on implicit passing of some arguments to llvm_process_sources).
2025-06-02[lldb][AIX] Added support to load DW_ranges section (#142356)Hemang Gadhavi1-0/+1
This PR is in reference to porting LLDB on AIX. Link to discussions on llvm discourse and github: 1. https://discourse.llvm.org/t/port-lldb-to-ibm-aix/80640 2. https://github.com/llvm/llvm-project/issues/101657 The complete changes for porting are present in this draft PR: https://github.com/llvm/llvm-project/pull/102601 - [lldb] [AIX] Added support to load Dwarf Ranges(.dwranges) section.
2025-05-29[LLDB][Minidump] Fix bug in generating 64b memory minidumps (#141995)Jacob Lalonde1-1/+2
In #129307, we introduced read write in chunks, and during the final revision of the PR I changed the behavior for 64b memory regions and did not test an actual 64b memory range. This caused LLDB to crash whenever we generated a 64b memory region. 64b regions has been a problem in testing for some time as it's a waste of test resources to generation a 5gb+ Minidump. I will work with @clayborg and @labath to come up with a way to specify creating a 64b list instead of a 32b list (likely via the yamilizer).