| Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This PR reduces outliers in terms of runtime performance, by asking the
OS to prefetch memory-mapped input files in advance, as early as
possible. I have implemented the Linux aspect, however I have only
tested this on Windows 11 version 24H2, with an active security stack
enabled. The machine is a AMD Threadripper PRO 3975WX 32c/64t with 128
GB of RAM and Samsung 990 PRO SSD.
I have used a Unreal Engine-based game to profile the link times. Here's
a quick summary of the input data:
```
Summary
--------------------------------------------------------------------------------
4,169 Input OBJ files (expanded from all cmd-line inputs)
26,325,429,114 Size of all consumed OBJ files (non-lazy), in bytes
9 PDB type server dependencies
0 Precomp OBJ dependencies
350,516,212 Input debug type records
18,146,407,324 Size of all input debug type records, in bytes
15,709,427 Merged TPI records
4,747,187 Merged IPI records
56,408 Output PDB strings
23,410,278 Global symbol records
45,482,231 Module symbol records
1,584,608 Public symbol records
```
In normal conditions - meanning all the pages are already in RAM - this
PR has no noticeable effect:
```
>hyperfine "before\lld-link.exe @Game.exe.rsp" "with_pr\lld-link.exe @Game.exe.rsp"
Benchmark 1: before\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 29.689 s ± 0.550 s [User: 259.873 s, System: 37.936 s]
Range (min … max): 29.026 s … 30.880 s 10 runs
Benchmark 2: with_pr\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 29.594 s ± 0.342 s [User: 261.434 s, System: 62.259 s]
Range (min … max): 29.209 s … 30.171 s 10 runs
Summary
with_pr\lld-link.exe @Game.exe.rsp ran
1.00 ± 0.02 times faster than before\lld-link.exe @Game.exe.rsp
```
However when in production conditions, we're typically working with the
Unreal Engine Editor, with exteral DCC tools like Maya, Houdini; we have
several instances of Visual Studio open, VSCode with Rust analyzer, etc.
All this means that between code change iterations, most of the input
OBJs files might have been already evicted from the Windows RAM cache.
Consequently, in the following test, I've simulated the worst case
condition by evicting all data from RAM with
[RAMMap64](https://learn.microsoft.com/en-us/sysinternals/downloads/rammap)
(ie. `RAMMap64.exe -E[wsmt0]` with a 5-sec sleep at the end to ensure
the System thread actually has time to evict the pages)
```
>hyperfine -p cleanup.bat "before\lld-link.exe @Game.exe.rsp" "with_pr\lld-link.exe @Game.exe.rsp"
Benchmark 1: before\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 48.124 s ± 1.770 s [User: 269.031 s, System: 41.769 s]
Range (min … max): 46.023 s … 50.388 s 10 runs
Benchmark 2: with_pr\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 34.192 s ± 0.478 s [User: 263.620 s, System: 40.991 s]
Range (min … max): 33.550 s … 34.916 s 10 runs
Summary
with_pr\lld-link.exe @Game.exe.rsp ran
1.41 ± 0.06 times faster than before\lld-link.exe @Game.exe.rsp
```
This is similar to the work done in MachO in
https://github.com/llvm/llvm-project/pull/157917
|
|
Original crash was observed in Chromium, in [1]. The problem occurs in
elf::isAArch64BTILandingPad because it didn't handle synthetic sections,
which can have a nullptr as a buf, so it crashed while trying to read
that buf.
After fixing that, a second issue occurs: When the patched code grows
too
much, it gets far away from the short jump, and the current
implementation
assumes a R_AARCH64_JUMP26 will be enough.
This PR changes the implementation to:
(a) In isAArch64BTILandingPad, checks if a section is synthetic, and
assumes that it'll NOT contain a landing pad, avoiding the buffer check;
(b) Suppress the size rounding for thunks that preceeds section
(Making the situation less likely to happen);
(c) Reimplements the patch by using a R_AARCH64_ABS64 in case the
patched code is still far away.
[1] https://issues.chromium.org/issues/440019454
---------
Co-authored-by: Tarcisio Fischer <tarcisio.fischer@arm.com>
|
|
This uses the same syntax as the ELF linker (added in
14e3bec8fc3e1f10c3dc57277ae3dbf9a4087b1c), mapping it to the recently
added COFF linker flag in
759fb0a224e85c01fffcd42b1e71a4bea6fc757e.
|
|
This patch adds the /linkreprofullpathrsp flag with the same behaviour
as link.exe. This flag emits a file containing the full paths to each
object passed to the link line.
This is used in particular when linking Arm64X binaries, as you need the
full path to all the Arm64 objects that were used in a standard Arm64
build.
See:
https://learn.microsoft.com/en-us/cpp/build/reference/link-repro-full-path-rsp
for the Microsoft documentation of the flag.
Relands #165449
|
|
Reverts llvm/llvm-project#165449 due to the new test failing in Linux
pre-commit CI.
|
|
This patch adds the /linkreprofullpathrsp flag with the same behaviour
as link.exe. This flag emits a file containing the full paths to each
object passed to the link line.
This is used in particular when linking Arm64X binaries, as you need the
full path to all the Arm64 objects that were used in a standard Arm64
build.
See:
https://learn.microsoft.com/en-us/cpp/build/reference/link-repro-full-path-rsp
for the Microsoft documentation of the flag.
|
|
... so that `local:*;` will be lexed as three tokens instead of a single
one in a version node. This is used by both version scripts and dynamic
lists. Fix #174363
In addition, clean up special code for space-separated `local :` and `global :`.
This patch brings our lexer behavior closer to GNU ld. While GNU ld
additionally rejects more characters like `~/+,=`, we don't implement
this additional validation.
Pull Request: https://github.com/llvm/llvm-project/pull/174530
|
|
Commit 21a4710c67 added this for ELF, this patch does the same for COFF.
The differences in codegen were noticed whilst testing DTLTO for COFF.
|
|
The other way is a dependency cycle.
|
|
This patch implements support for handling archive members in DTLTO.
Unlike ThinLTO, where archive members are passed as in-memory buffers,
DTLTO requires archive members to be materialized as individual files on
the filesystem.
This is necessary because DTLTO invokes clang externally, which expects
file-based inputs.
To support this, this implementation identifies archive members among
the input files,
saves them to the filesystem, and updates their module_id to match their
file paths.
|
|
Although mergeRels is called prior to using this size for final layout,
Writer::setReservedSymbolSections uses this in order to set the value of
__rel[a]_iplt_end and, downstream in Morello LLVM, __rel[a]_dyn_end.
Currently none of the relocations that can exist when static linking (as
the case when these symbols are defined) are sharded, but a future
commit will change this for R_AARCH64_AUTH_RELATIVE, and similarly
R_MORELLO_RELATIVE is sharded downstream in Morello LLVM. Make sure we
compute the right size when called prior to mergeRels, and add a
regression test to demonstrate that R_AARCH64_AUTH_RELATIVE still gets
the right __rel[a]_ipt_end in future even when sharding is adopted.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/173285
|
|
Other than the ordering requirements that remain between sections, this
abstracts the details of how these sections are implemented.
Note that isNeeded already checks relocsVec for both section types, so
finalizeSynthetic can call it before mergeRels just fine.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171203
|
|
This call to addRelativeReloc is the same as the one at the end of the
function, so skip the relrDyn code for this case and add the special
out-of-bounds handling code to the end of the function. This makes it
obvious where MTE globals differ in behaviour rather than having to
compare the two different implementations.
This also adds a comment documenting why relrDyn isn't used, and in it
highlights that it's probably safe to use relrDyn so long as the offset
is within the symbol's bounds.
Reviewers: pcc, kovdan01, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171181
|
|
This makes addRelativeReloc a bit more readable and uniform, as well as
the relrAuthDyn call in RelocScan::process.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171178
|
|
This is just a copy of InputSectionBase::addReloc, so we can just
forward to that rather than poking into the internals. Whilst here, move
the implementation to the header so it can be inlined.
This is helpful downstream for CHERI, as static relocations to emit an
entire capability (whether for a relative relocation or for an undefined
weak symbol) need to be split in two, one per word, as getRelocTargetVA
only returns a uint64_t. Having a single function that pushes to
InputSectionBase's static relocations array centralises that so the
outside world can pretend it's a singular relocation, and internally it
gets mapped to the pair.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171177
|
|
There's no need to poke into the internals, we can just use the more
abstract member function like everywhere else in LLD.
Reviewers: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171176
|
|
When a stub .so file contains
```
A: B
```
And `A` is defined in bitcode that's pulled in for LTO, but both `A` and
`B` are removed in `LTO::linkRegularLTO` due to not being dead:
https://github.com/llvm/llvm-project/blob/24297bea9672722d8fbaaff137b301b0becaae9c/llvm/lib/LTO/LTO.cpp#L1042-L1054
Then the symbol `A` becomes undefined after LTO, `processStubLibraries`
tries to import `A` from JS, and tries to export its dependency `B`:
https://github.com/llvm/llvm-project/blob/24297bea9672722d8fbaaff137b301b0becaae9c/lld/wasm/Driver.cpp#L1108-L1109
But `B` is gone, causing this error:
```console
wasm-ld: error: ....: undefined symbol: B. Required by A
```
This PR checks if the symbol is used in regular objects before trying to
exporrt its dependences, ensuring the case above doesn't crash the
linker.
|
|
Rather than trying to infer deep down in AArch64::relocate whether we
need to actually write anything or not, we should instead mark the
relocations that we no longer want so we don't actually apply them. This
is similar to how X86_64::deleteFallThruJmpInsn works, although given
the target is still valid we don't need to mess with the offset, just
the expr.
This is mostly NFC, but if the addend ever exceeded 32-bits but then
came back in range then previously we'd pointlessly write it, but now we
do not. We also validate that the addend is actually 32-bit so will
catch errors in our implementation rather than silently assuming any
relocations where that isn't true have been moved to .rela.dyn.
Reviewers: kovdan01, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171192
|
|
The current implementation in addRelativeReloc makes it look like we're
writing the symbol's VA + addend to the section, because that's what the
given relocation will evaluate to, but we're supposed to be writing the
negated original addend (since the relative relocation's addend will be
the sum of the symbol's VA and the original addend). This only works
because deep down in AArch64::relocate we throw away the computed value
and peek back inside the relocation to extract the addend and negate it.
Do this properly by having a relocation that evaluates to the right
value instead.
Reviewers: kovdan01, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171182
|
|
The only difference between these calls is whether rel or type is passed
as the first argument, but AArch64::getDynRel returns type unchanged for
R_AARCH64_AUTH_ABS64, so they are the same.
Reviewers: MaskRay, kovdan01
Pull Request: https://github.com/llvm/llvm-project/pull/171179
|
|
After enabling DFLTCC in zlib-ng for s390x this test starts failing,
because slightly better compression is produced at level 1. Add 1c as a
permissible output.
|
|
This should fix buildbot failures after
759fb0a224e85c01fffcd42b1e71a4bea6fc757e.
|
|
This adds support for FatLTO to COFF targets in clang and lld.
The changes are adapted from
https://github.com/llvm/llvm-project/commit/610fc5cbcc8b68879c562f6458608afe2473ab7f
and
https://github.com/llvm/llvm-project/commit/14e3bec8fc3e1f10c3dc57277ae3dbf9a4087b1c
but much smaller because it just needed the COFF-specific parts wired
in, and I tried my best to adapt the pre-existing ELF tests for the COFF
version.
My main goal is to be able to use this for shipping pre-built
https://github.com/XboxDev/nxdk container images someday, which uses the
`i386-pc-win32` target.
|
|
Previously, LLD would always set the implicit entry point for DLLs to
the symbol that is prefixed with an underscore. However, mingw-w64
defines it without that underscore.
This change fixes that by adding a special branch for MinGW. Also, it
simplifies tests that use MinGW style DLL entry symbol by skipping the
entry point argument.
Note, tests that use MSVC style DLL entry symbol and LLD in MinGW mode,
will now require using explicit entry point. I believe this is sensible.
When an explicit entry point is passed, i.e. LLD is called by Clang or
GCC, there will be no observable difference.
Fixes https://github.com/llvm/llvm-project/issues/171441
|
|
llvm-objdump was missing "literal pool symbol address" comments for
arm64_32 stub disassembly. Fixed by adding 32-bit instruction support
(LDRWui, ADDWri, LDRWl) to AArch64ExternalSymbolizer and aarch64_32
architecture checks to MachODump.cpp symbolization code.
Fixes #49288
|
|
This option will cause the linker to emit LLVM bitcode instead of an
object file. The implementation is similar to that of the corresponding
option in the ELF backend. This only works with LLD and will not work
the gold plugin.
|
|
|
|
NEEDS_TLSGD_TO_IE is only ever set when the symbol is preeptible, in
which case addTpOffsetGotEntry will just add the symbol to the GOT and
emit a symbolic tlsGotRel anyway, so there is no need to give it its own
special case.
As well as simplifying the code upstream, this is useful downstream for
Morello, which doesn't really have a proper GD/IE-to-LE relaxation, and
so for GD-to-IE can benefit from being able to use the optimisations
addTpOffsetGotEntry has for non-preemptible symbols, rather than having
to reimplement them here.
|
|
This is a follow up of the discussions in
https://github.com/llvm/llvm-project/pull/163497
|
|
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
|
|
|
|
cuEntries was sorted indirectly through a separate `cuIndices`.
Eliminate cuIndices for simplicity.
Linking chromium_framework from `#48001` with `-no_uuid` gives identical
executable using this patch.
|
|
... as they are closely related. Also improve the comments.
|
|
(#169273)
This is achieved by using some of the bits of RelType to tag vendor namespaces. This change also adds a relocation iterator for RISCV that folds vendor namespaces into the RelType of the following relocation.
This patch is extracted from the implementation of RISCV vendor-specific relocations in the CHERIoT LLVM downstream: https://github.com/CHERIoT-Platform/llvm-project/commit/3d6d6f7d9480b590731cbcf4b4817e1fa3049854
|
|
When wrapping a symbol `foo` via `-wrap=foo`, we create the symbol
`__wrap_foo` that replaces all mentions of `foo`. This feature was
implemented for wasm-ld in commit a5ca34e.
So far, no valid signature has been attached to the undefined symbol,
leading to a nullptr dereference in the logic for creating the import
section. This change adds the correct signature to the wrapped symbol,
enabling the generation of an import for it.
|
|
Further to
https://github.com/llvm/llvm-project/pull/147134#discussion_r2337246489,
switch to use the madvise() api to page in mmap'd files and
1) All new code compiled in #if LLVM_ENABLE_THREADS is set so it can be
seen where the changes were from this PR.
2) The new PR moves to use madvise() instead of the ad-hoc page
referencing code I wrote which should avoid SIGSEGVs if the buffer is
deallocated.
3) A new property SerialBackgroundQueue().stopAllWork to be used to stop
background workers when there is no further call for them. Usually the
background "page-in" threads have completed first but it seems with this
troublesome test this is not always the case and buffers stored in the
static input file cache are being deallocated while being referenced.
---------
Co-authored-by: James Henderson <James.Henderson@sony.com>
|
|
(#169062)
I noticed that we had a hardcoded value of 4 for the pcrel section
relocations, which seems like an issue given that we recently added
support for 1-byte branch relocations in
https://github.com/llvm/llvm-project/pull/164439. The code included an
assert that the relevant relocation had the BYTE4 attribute, but that is
actually not enough to use a hardcoded value of 4: we need to assert
that the *other* `BYTE<n>` attributes are not set either.
However, since we did not support local branch relocations, that doesn't
seem to have mattered in practice. That said, local branch relocations
can be emitted by compilers, and ld64 does handle the 4-byte version of
them, so I've added support for it here.
ld64 actually seems to reject 1-byte section relocations, so the
questionable code is actually probably fine (minus the incorrect
assert). So we have two options: add an equivalent check in LLD, or just
support 1-byte local branch relocations. Supporting it actually requires
less code, so I've gone with that option here.
|
|
|
|
The GNU documentation is ambiguous about the version index for
unversioned undefined symbols. The current specification at
https://sourceware.org/gnu-gabi/program-loading-and-dynamic-linking.txt
defines VER_NDX_LOCAL (0) as "The symbol is private, and is not
available outside this object."
However, this naming is misleading for undefined symbols. As suggested
in
discussions, VER_NDX_LOCAL should conceptually be VER_NDX_NONE and apply
to unversioned undefined symbols as well.
GNU ld has used index 0 for unversioned undefined symbols both before
version 2.35 (see https://sourceware.org/PR26002) and in the upcoming
2.46 release (see https://sourceware.org/PR33577). This change aligns
with GNU ld's behavior by switching from index 1 to index 0.
While here, add a test to dso-undef-extract-lazy.s that undefined
symbols of index 0 in DSO are treated as unversioned symbols.
|
|
(#167825)
Currently, if multiple external weak symbols are defined at the same
address in an object file (e.g., by using the .set assembler directive
to alias them to a single weak variable), ld64.lld treats them as a
single unit. When any one of these symbols is overridden by a strong
definition, all of the original weak symbols resolve to the strong
definition.
This patch changes the behavior in `transplantSymbolsAtOffset`. When a
weak symbol is being replaced by a strong one, only non-external (local)
symbols at the same offset are moved to the new symbol's section. Other
*external* symbols are no longer transplanted.
This allows each external weak symbol to be overridden independently.
This behavior is consistent with Apple's ld-classic, but diverges from
ld-prime in one case, as noted on
https://github.com/llvm/llvm-project/issues/167262 (this discrepancy has
recently been reported to Apple).
### Backward Compatibility
This change alters linker behavior for a specific scenario. The creation
of multiple external weak symbols aliased to the same address via
assembler directives is primarily an advanced technique. It's unlikely
that existing builds rely on the current behavior of all aliases being
overridden together.
If there are concerns, this could be put behind a linker option, but the
new default seems more correct, less surprising, and is consistent with
ld-classic.
### Testing
The new lit test `test/MachO/weak-alias-override.s` verifies this
behavior using llvm-nm.
Fixes #167262
|
|
link.exe (#168364)
Various build tools may produce command lines invoking clang-cl and
lld-link which contain /link twice like so: e.g. `clang-cl.exe
sanitycheckcpp.cc /Fesanitycheckcpp.exe .... /link /link ...`
If link.exe is used, it ignores the extra `/link` and just issues a
warning, however lld-link tries to treat `/link` as a file name.
This PR adds a flag which is ignored in order to improve compatibility
with link.exe
There's some extra context including an "in-the-wild" example and
reproducer of the problem here:
https://github.com/frankier/meson_clang_win_activation
Co-authored-by: Frankie Robertson <frankie@robertson.name>
|
|
R_AARCH64_FUNCINIT64 is a dynamic relocation type for relocating
word-sized data in the output file using the return value of
a function. An R_AARCH64_FUNCINIT64 shall be relocated as an
R_AARCH64_IRELATIVE with the target symbol address if the target
symbol is non-preemptible, and it shall be a usage error to relocate an
R_AARCH64_FUNCINIT64 with a preemptible or STT_GNU_IFUNC target symbol.
The initial use case for this relocation type shall be for emitting
global variable field initializers for structure protection. With
structure protection, the relocation value computation is tied to the
compiler implementation in such a way that it would not be reasonable to
define a relocation type for it (for example, it may involve computing
a hash using a compiler-determined algorithm), hence the need for the
computation to be implemented as code in the binary.
Part of the AArch64 psABI extension:
https://github.com/ARM-software/abi-aa/issues/340
Reviewers: smithp35, fmayer, MaskRay
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/156564
|
|
My 2020 change that added versioned symbol recognition
(reviews.llvm.org/D80059) checks both VER_NDX_LOCAL and VER_NDX_GLOBAL,
though test coverage was missing. lld/test/ELF/dso-undef-extract-lazy.s
checks that the undefined symbol is indeed considered unversioned.
|
|
Fixes: e1979aed0a15 ("Implement gd to ie relaxation for aarch64.")
|
|
https://github.com/llvm/llvm-project/pull/140307 added support for
cstring hashes in the orderfile to layout cstrings in a specific order,
but only when `--deduplicate-strings` is used. This PR supports cstring
ordering when `--no-deduplicate-strings` is used.
1. Create `cStringPriorities`, separate from `priorities`, to hold only
priorities for cstring pieces. This allows us to lookup by hash
directly, instead of first converting to a string. It also fixes a
contrived bug where we want to order a symbol named `CSTR;12345` rather
than a cstring.
2. Rather than calling `buildCStringPriorities()` which always
constructs and returns a vector, we use `forEachStringPiece()` to
efficiently iterate over cstring pieces without creating a new vector if
no cstring is ordered.
3. Create `SymbolPriorityEntry::{get,set}Priority()` helper functions to
simplify code.
|
|
SyntheticSection (#166323)
A field-named 'size' already available and perfectly usable via
inheritance from InputSection, and these variables shadow it for no good
reason.
The only interesting change here is in PaddingSection, because a
parent's field cannot be initialized via a constructor initializer list,
setting it needs to be done inside the constructor body.
|
|
We already ensure that code for different architectures is always placed
in different pages in `assignAddresses`. We represent those ranges using
their first and last chunks. However, the RVAs of those chunks may not
be page-aligned, for example, due to extra padding for entry-thunk
offsets. Align the chunk RVAs to the page boundary so that the emitted
ranges correctly include the entire region.
This change affects an existing test that checks corner cases triggered
by merging a data section into a code section. We may now include such
data in the code range. This differs from MSVC’s behavior, but it should
not cause practical issues, and the new behavior is arguably more
correct.
Fixes #168119.
|
|
Ran into a use case where we had a MachO object file with a section
symbol which did not have a section associated with it segfaults during
linking. This patch aims to handle such cases gracefully and avoid the
linker from crashing.
---------
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
|
|
The really painful part of this PR was updating all the test files. I
had some help from Gemini GLI there
which did a pretty good job (got maybe 80% of the updates done).
Fixes: #151015
|