aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Object/WasmObjectFile.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-02-15[Object][Wasm] Use offset instead of index for Global address and store size ↵Derek Schuff1-12/+20
(#81781) Currently the address reported by binutils for a global is its index; but its offset (in the file or section) is more useful for binary size attribution. This PR treats globals similarly to functions, and tracks their offset and size. It also centralizes the logic differentiating linked from object and dylib files (where section addresses are 0).
2024-02-09[llvm-nm][WebAssembly] Print function symbol sizes (#81315)Derek Schuff1-0/+14
nm already prints sizes for data symbols. Do that for function symbols too, and update objdump to also print size information. Implements item 3 from https://github.com/llvm/llvm-project/issues/76107
2024-02-08[Object][WebAssembly] Improve error on invalid relocation (#81203)Sam Clegg1-22/+18
See https://github.com/emscripten-core/emscripten/issues/21140
2024-02-08[Object][Wasm] Generate symbol info from name section names (#81063)Derek Schuff1-4/+45
Currently symbol info is generated from a linking section or from export names. This PR generates symbols in a WasmObjectFile from the name section as well, which allows tools like objdump and nm to show useful information for more linked binaries. There are some limitations: most notably that we don't assume any particular ABI, so we don't get detailed information about data symbols if the segments are merged (which is the default). Covers most of the desired functionality from #76107
2024-02-07[Object][Wasm] Use file offset for section addresses in linked wasm files ↵Derek Schuff1-1/+7
(#80529) Wasm has no unified virtual memory space as other object formats and architectures do, so previously WasmObjectFile reported 0 for all section addresses, and until 428cf71ff used section offsets for function symbols. Now we use file offsets for function symbols, and this change switches section addresses to do the same (in linked files). The main result of this is that objdump now reports VMAs in section listings, and also uses file offets rather than section offsets when disassembling linked binaries (matching the behavior of other disassemblers and stack traces produced by browwsers). To make this work, this PR also updates objdump's generation of synthetics fallback symbols to match lib/Object and also correctly plumbs symbol types for regular and dummy symbols through to the backend to avoid needing special knowledge of address 0. This also paves the way for generating symbols from name sections rather than symbol tables or imports (see #76107) by allowing the disassembler's synthetic fallback symbols match the name-section generated symbols (in a followup PR).
2024-02-02[Object][Wasm] Move WasmSymbolInfo directly into WasmSymbol (NFC) (#80219)Derek Schuff1-9/+2
Move the WasmSymbolInfos from their own vector on the WasmLinkingData directly into the WasmSymbol object. Removing the const-ref to an external object allows the vector of WasmSymbols to be safely expanded/reallocated; generating symbol info from the name section will require this, as the numbers of function and data segment names are stored separately. This is a step toward generating symbol information from name sections for #76107
2024-01-25[Object][Wasm] Allow parsing of GC types in type and table sections (#79235)Derek Schuff1-28/+145
This change allows a WasmObjectFile to be created from a wasm file even if it uses typed funcrefs and GC types. It does not significantly change how lib/Object models its various internal types (e.g. WasmSignature, WasmElemSegment), so LLVM does not really "support" or understand such files, but it is sufficient to parse the type, global and element sections, discarding types that are not understood. This is useful for low-level binary tools such as nm and objcopy, which use only limited aspects of the binary (such as function definitions) or deal with sections as opaque blobs. This is done by allowing `WasmValType` to have a value of `OTHERREF` (representing any unmodeled reference type), and adding a field to `WasmSignature` indicating it's a placeholder for an unmodeled reference type (since there is a 1:1 correspondence between WasmSignature objects and types in the type section). Then the object file parsers for the type and element sections are expanded to parse encoded reference types and discard any unmodeled fields.
2024-01-17[WebAssembly] Use ValType instead of integer types to model wasm tables (#78012)Derek Schuff1-11/+12
LLVM models some features found in the binary format with raw integers and others with nested or enumerated types. This PR switches modeling of tables and segments to use wasm::ValType rather than uint32_t. This NFC change is in preparation for modeling more reference types, but IMO is also cleaner and closer to the spec.
2024-01-03Reland "[WebAssembly][Object]Use file offset as function symbol address for ↵Derek Schuff1-4/+13
linked files (#76198)" WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context. Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files. This is a reland of fc5f51cf with a fix for the MSan failure (it was not caused by this change, but it was revealed by the new tests).
2024-01-03Revert "[WebAssembly][Object]Use file offset as function symbol address for ↵Mitch Phillips1-12/+4
linked files (#76198)" This reverts commit fc5f51cf5af4364b38bf22e491d46e1e892ade0c. Reason: Broke the sanitizer buildbot - https://lab.llvm.org/buildbot/#/builders/5/builds/39751/steps/12/logs/stdio
2024-01-02[WebAssembly][Object]Use file offset as function symbol address for linked ↵Derek Schuff1-4/+12
files (#76198) WebAssembly doesn't have a single virtual memory space the way other object formats or architectures do, so "addresses" mean different things depending on the context. Function symbol addresses in object files are offsets from the start of the code section. This is good for linking and relocation. However when dealing with linked binaries, offsets from the start of the file/module are more often used (e.g. for stack traces in browsers), and are more useful for use cases like binary size attribution. This PR changes Object to use the file offset instead of the section offset for function symbols, but only for linked (non-DSO) files. This implements item number 4 from #76107
2023-12-26[WebAssembly] Add bounds check in parseCodeSection (#76407)DavidKorczynski1-0/+5
This is needed as otherwise `Ctx.Ptr` will be incremented to a position outside it's available buffer, which is being used to read values e.g. https://github.com/llvm/llvm-project/blob/966d564e43e650b9c34f9c67829d3947f52add91/llvm/lib/Object/WasmObjectFile.cpp#L1469 Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=28856 Signed-off-by: David Korczynski <david@adalogics.com>
2023-12-21[WebAssembly][Object] Record section start offsets at start of payload (#76188)Derek Schuff1-1/+1
LLVM ObjectFile currently records the start offsets of sections as the start of the section header, whereas most other tools (WABT, emscripten, wasm-tools) record it as the start of the section content, after the header. This affects binutils tools such as objdump and nm, but not compilation/assembly (since that is driven by symbols and assembler labels which already have their values inside the section payload rather in the header. This patch updates LLVM to match the other tools.
2023-12-20[WebAssembly] Add symbol information for shared libraries (#75238)Sam Clegg1-3/+47
The current (experimental) spec for WebAssembly shared libraries does not include a full symbol table like the object format. This change extracts symbol information from the normal wasm exports. This is the first step in having the linker report undefined symbols when linking with shared libraries. The current behaviour is to ignore all undefined symbols when linking with `-pie` or `-shared`. See https://github.com/emscripten-core/emscripten/issues/18198
2023-12-11[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)Kazu Hirata1-1/+1
This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.
2023-10-03[WebAssembly] Allow absolute symbols in the linking section (symbol table) ↵Sam Clegg1-9/+13
(#67493) Fixes a crash in `-Wl,-emit-relocs` where the linker was not able to write linker-synthetic absolute symbols to the symbol table. This change adds a new symbol flag (`WASM_SYMBOL_ABS`), which means that the symbol's offset is absolute and not relative to a given segment. Such symbols include `__stack_low` and `__stack_low`. Note that wasm object files never contains such symbols, only binaries linked with `-Wl,-emit-relocs`. Fixes: #67111
2023-08-25[llvm-nm][WebAssembly] Report the size of data symbolsSam Clegg1-1/+1
Fixes: https://github.com/llvm/llvm-project/issues/58839 Differential Revision: https://reviews.llvm.org/D158799
2023-07-27[WebAssembly][Objcopy] Write output section headers identically to inputsDerek Schuff1-0/+4
Previously when objcopy generated section headers, it padded the LEB that encodes the section size out to 5 bytes, matching the behavior of clang. This is correct, but results in a binary that differs from the input. This can sometimes have undesirable consequences (e.g. breaking source maps). This change makes the object reader remember the size of the LEB encoding in the section header, so that llvm-objcopy can reproduce it exactly. For sections not read from an object file (e.g. that llvm-objcopy is adding itself), pad to 5 bytes. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D155535
2023-07-11[WebAssembly] Support `annotate` clang attributes for marking functions.Brendan Dahl1-0/+2
Annotation attributes may be attached to a function to mark it with custom data that will be contained in the final Wasm file. The annotation causes a custom section named "func_attr.annotate.<name>.<arg0>.<arg1>..." to be created that will contain each function's index value that was marked with the annotation. A new patchable relocation type for function indexes had to be created so the custom section could be updated during linking. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D150803
2023-06-26Move SubtargetFeature.h from MC to TargetParserJob Noorman1-1/+1
SubtargetFeature.h is currently part of MC while it doesn't depend on anything in MC. Since some LLVM components might have the need to work with target features without necessarily needing MC, it might be worthwhile to move SubtargetFeature.h to a different location. This will reduce the dependencies of said components. Note that I choose TargetParser as the destination because that's where Triple lives and SubtargetFeatures feels related to that. This issues came up during a JITLink review (D149522). JITLink would like to avoid a dependency on MC while still needing to store target features. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D150549
2023-02-27[lld][WebAssembly] Fix handling of mixed strong and weak referencesSam Clegg1-1/+12
When adding a undefined symbols to the symbol table, if the existing reference is weak replace the symbol flags with (potentially) non-weak binding. Fixes: https://github.com/llvm/llvm-project/issues/60829 Differential Revision: https://reviews.llvm.org/D144747
2023-02-07[NFC][TargetParser] Remove llvm/ADT/Triple.hArchibald Elliott1-1/+1
I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
2023-01-16[llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attributeElena Lepilkina1-1/+1
Differential Revision: https://reviews.llvm.org/D139553
2022-12-09Revert D139098 "[Alignment] Use Align for ObjectFile::getSectionAlignment"Guillaume Chatelet1-2/+2
This breaks lld. This reverts commit 10c47465e2505ddfee4e62a2ab2e535abea3ec56.
2022-12-09[Alignment] Use Align for ObjectFile::getSectionAlignmentGuillaume Chatelet1-2/+2
Differential Revision: https://reviews.llvm.org/D139098
2022-08-31[lld][WebAssemby] Allow import module names to be empty strings.Dan Gohman1-12/+4
The component-model [canonical ABI] is currently using import names with empty strings. Remove the special cases for empty strings from WasmObjectFile.cpp so that they can pass through as-is. [canonical ABI]: https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md Differential Revision: https://reviews.llvm.org/D133037
2022-07-17[llvm] Modernize bool literals (NFC)Kazu Hirata1-1/+1
Identified with modernize-use-bool-literals.
2022-06-23[WebAssembly][Object] Remove requirement that objects must have code sectionsDerek Schuff1-13/+3
When parsing name and linking sections, we currently require that the object must have a code section (it seems that this was intended to verify section ordering). However it can be useful for binaries to have their code sections stripped out (e.g. if we just want the debug info). In that case we need the rest of the known sections (so e.g. we know how many functions there are, to verify the name section) but not the actual code. I've removed the restriction completely. I think this is OK because the section-parsing code already checks function and global indices in many places for validity and will return appropriate errors if the relevant sections are missing. Also we can't just replace the requirement of seeing a code section with a requirement that we see a function or global section, because a binary may just not have any functions or globals. But there's only an problem if the name or linking section tries to name a nonexistent function. Part of a fix for https://github.com/emscripten-core/emscripten/issues/13084 Differential Revision: https://reviews.llvm.org/D128094
2022-06-20[llvm] Don't use Optional::getValue (NFC)Kazu Hirata1-1/+1
2022-06-07[WebAssembly] Add WASM_SEC_LAST_KNOWN to BinaryFormat section types list [NFC]Derek Schuff1-1/+1
There are 3 places where we were using WASM_SEC_TAG as the "last" known section type, which requires updating (or leaves a bug) when a new known section type is added. Instead add a "last type" to the enum for this purpose. Differential Revision: https://reviews.llvm.org/D127164
2022-05-27[WebAssembly] Consolidate sectionTypeToString in BinaryFormat [NFC]Derek Schuff1-21/+3
Currently there are 2 duplicate implementation, and I want to add a use in a 3rd place. Combine them in lib/BinaryFormat so they can be shared. Also update toString for symbol and reloc types to use StringRef Differential Revision: https://reviews.llvm.org/D126553
2022-03-15[WebAssembly] Fix asan issue from https://reviews.llvm.org/D121349Sam Clegg1-0/+1
2022-03-14[WebAssembly] Fix asan issue from https://reviews.llvm.org/D121349Sam Clegg1-0/+1
2022-03-14[WebAssembly] Second phase of implemented extended const proposalSam Clegg1-21/+56
This change continues to lay the ground work for supporting extended const expressions in the linker. The included test covers object file reading and writing and the YAML representation. Differential Revision: https://reviews.llvm.org/D121349
2022-02-10Cleanup LLVMObject headersserge-sans-paille1-2/+0
Most notably, llvm/Object/Binary.h no longer includes llvm/Support/MemoryBuffer.h llvm/Object/MachOUniversal*.h no longer include llvm/Object/Archive.h llvm/Object/TapiUniversal.h no longer includes llvm/Object/TapiFile.h llvm-project preprocessed size: before: 1068185081 after: 1068324320 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D119457
2021-11-05[NFC] Inclusive language: Remove instances of master in URLsQuinn Pham1-1/+1
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or redirecting to the new URL. Reviewed By: #libc, ldionne, mehdi_amini Differential Revision: https://reviews.llvm.org/D113186
2021-10-15[WebAssembly] Add import info to `dylink` section of shared librariesSam Clegg1-0/+8
See https://github.com/WebAssembly/tool-conventions/pull/175 Differential Revision: https://reviews.llvm.org/D111345
2021-10-12[WebAssembly] Make EH work with dynamic linkingHeejin Ahn1-2/+4
This makes Wasm EH work with dynamic linking. So far we were only able to handle destructors, which do not use any tags or LSDA info. 1. This uses `TargetExternalSymbol` for `GCC_except_tableN` symbols, which points to the address of per-function LSDA info. It is more convenient to use than `MCSymbol` because it can take additional target flags. 2. When lowering `wasm_lsda` intrinsic, if PIC is enabled, make the symbol relative to `__memory_base` and generate the `add` node. If PIC is disabled, continue to use the absolute address. 3. Make tag symbols (`__cpp_exception` and `__c_longjmp`) undefined in the backend, because it is hard to make it work with dynamic linking's loading order. Instead, we make all tag symbols undefined in the LLVM backend and import it from JS. 4. Add support for undefined tags to the linker. Companion patches: - https://github.com/WebAssembly/binaryen/pull/4223 - https://github.com/emscripten-core/emscripten/pull/15266 Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D111388
2021-10-05[WebAssembly] Remove WasmTagTypeHeejin Ahn1-10/+21
This removes `WasmTagType`. `WasmTagType` contained an attribute and a signature index: ``` struct WasmTagType { uint8_t Attribute; uint32_t SigIndex; }; ``` Currently the attribute field is not used and reserved for future use, and always 0. And that this class contains `SigIndex` as its property is a little weird in the place, because the tag type's signature index is not an inherent property of a tag but rather a reference to another section that changes after linking. This makes tag handling in the linker also weird that tag-related methods are taking both `WasmTagType` and `WasmSignature` even though `WasmTagType` contains a signature index. This is because the signature index changes in linking so it doesn't have any info at this point. This instead moves `SigIndex` to `struct WasmTag` itself, as we did for `struct WasmFunction` in D111104. In this CL, in lib/MC and lib/Object, this now treats tag types in the same way as function types. Also in YAML, this removes `struct Tag`, because now it only contains the tag index. Also tags set `SigIndex` in `WasmImport` union, as functions do. I think this makes things simpler and makes tag handling more in line with function handling. These two shares similar properties in that both of them have signatures, but they are kind of nominal so having the same signature doesn't mean they are the same element. Also a drive-by fix: the reserved 'attirubute' part's encoding changed from uleb32 to uint8 a while ago. This was fixed in lib/MC and lib/Object but not in YAML. This doesn't change object files because the field's value is always 0 and its encoding is the same for the both encoding. This is effectively NFC; I didn't mark it as such just because it changed YAML test results. Reviewed By: sbc100, tlively Differential Revision: https://reviews.llvm.org/D111086
2021-10-04[Object][WebAssemlby] Report function types (signatures). NFCSam Clegg1-8/+9
This simplifies the code in a number of ways and avoids having to track functions and their types separately. Differential Revision: https://reviews.llvm.org/D111104
2021-09-29[WebAssemlby][Object] Fix dead code in WasmObjectFile.cppSam Clegg1-2/+1
I introduced this by mistake in https://reviews.llvm.org/D109595. Differential Revision: https://reviews.llvm.org/D110717
2021-09-14[WebAssembly] Allow import and export of TLS symbols between DSOsSam Clegg1-0/+7
We previously had a limitation that TLS variables could not be exported (and therefore could also not be imported). This change removed that limitation. Differential Revision: https://reviews.llvm.org/D108877
2021-09-12[WebAssembly] Convert to new "dylink.0" section formatSam Clegg1-1/+52
This format is based on sub-sections (like the "linking" and "name" sections) and is therefore easier to extend going forward. spec change: https://github.com/WebAssembly/tool-conventions/pull/170 binaryen change: https://github.com/WebAssembly/binaryen/pull/4141 wabt change: https://github.com/WebAssembly/wabt/pull/1707 emscripten change: https://github.com/emscripten-core/emscripten/pull/15019 Differential Revision: https://reviews.llvm.org/D109595
2021-09-10[WebAssembly][libObject] Avoid re-use of Section object during parsingSam Clegg1-1/+1
The re-use of this struct across iterations of the loop was causing fields (specifically Name) to be incorrectly shared between multiple sections. Differential Revision: https://reviews.llvm.org/D108984
2021-07-19[WebAssembly] Support R_WASM_MEMORY_ADDR_TLS_SLEB64 for wasm64Wouter van Oortmerssen1-0/+1
Also fixed TLS tests swapping addr & value in store op Differential Revision: https://reviews.llvm.org/D106096
2021-06-21[WebAssembly] Make tag attribute's encoding uint8Heejin Ahn1-2/+2
This changes the encoding of the `attribute` field, which currently only contains the value `0` denoting this tag is for an exception, from `varuint32` to `uint8`. This field is effectively unused at the moment and reserved for future use, and it is not likely to need `varuint32` even in future. See https://github.com/WebAssembly/exception-handling/pull/162. This does not change any encoded binaries because `0` is encoded in the same way both in `varuint32` and `uint8`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104571
2021-06-17[WebAssembly] Rename event to tagHeejin Ahn1-59/+59
We recently decided to change 'event' to 'tag', and 'event section' to 'tag section', out of the rationale that the section contains a generalized tag that references a type, which may be used for something other than exceptions, and the name 'event' can be confusing in the web context. See - https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130 - https://github.com/WebAssembly/exception-handling/pull/161 Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104423
2021-05-20[WebAssembly] Fix PIC/GOT codegen for wasm64Wouter van Oortmerssen1-0/+1
__table_base is know 64-bit, since in LLVM it represents a function pointer offset __table_base32 is a copy in wasm32 for use in elem init expr, since no truncation may be used there. New reloc R_WASM_TABLE_INDEX_REL_SLEB64 added Differential Revision: https://reviews.llvm.org/D101784
2021-05-12[lld][WebAssembly] Allow data symbols to extend past end of segmentSam Clegg1-3/+6
This fixes a bug with string merging with string symbols that contain NULLs, as is the case in the `merge-string.s` test. The bug only showed when we run with `--relocatable` and then try read the resulting object back in. In this case we would end up with string symbols that extend past the end of the segment in which they live. The problem comes from the fact that sections which are flagged as string mergable assume that all strings are NULL terminated. The merging algorithm will drop trailing chars that follow a NULL since they are essentially unreachable. However, the "size" attribute (in the symbol table) of such a truncated symbol is not updated resulting a symbol size that can overlap the end of the segment. I verified that this can happen in ELF too given the right conditions and the its harmless enough. In practice Strings that contain embedded null should not be part of a mergable section. Differential Revision: https://reviews.llvm.org/D102281
2021-05-10Reland: "[lld][WebAssembly] Initial support merging string data"Sam Clegg1-2/+2
This change was originally landed in: 5000a1b4b9edeb9e994f2a5b36da8d48599bea49 It was reverted in: 061e071d8c9b98526f35cad55a918a4f1615afd4 This change adds support for a new WASM_SEG_FLAG_STRINGS flag in the object format which works in a similar fashion to SHF_STRINGS in the ELF world. Unlike the ELF linker this support is currently limited: - No support for SHF_MERGE (non-string merging) - Always do full tail merging ("lo" can be merged with "hello") - Only support single byte strings (p2align 0) Like the ELF linker merging is only performed at `-O1` and above. This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828, although crucially it doesn't not currently support debug sections because they are not represented by data segments (they are custom sections) Differential Revision: https://reviews.llvm.org/D97657