Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a simplified c12d49c4e286fa108d4d69f1c6d2b8d691993ffd in main
which just fixes the bug but does not affect the -O2 deduplication.
|
|
The deduplication requires a DenseMap of the same size of the local part of
.strtab . I optimized it in e20544543478b259eb09fa0a253d4fb1a5525d9e but it is
still quite slow.
For Release build of clang, deduplication makes .strtab 1.1% smaller and makes the link 3% slower.
For chrome, deduplication makes .strtab 0.1% smaller and makes the link 6% slower.
I suggest that we only perform the optimization with -O2 (default is -O1).
Not deduplicating local symbol names will simplify parallel symbol table write.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D118577
|
|
My x86-64 lld executable is 8KiB smaller.
|
|
The work is more balanced.
|
|
std::make_unique
Similar to D116143. My x86-64 lld executable is 20+KiB smaller.
|
|
|
|
* `RelocationBaseSection::addReloc` increases `numRelativeRelocs`, which
duplicates the work done by RelocationSection<ELFT>::writeTo.
* --pack-dyn-relocs=android has inappropropriate DT_RELACOUNT.
AndroidPackedRelocationSection does not necessarily place relative relocations
in the front and DT_RELACOUNT might cause semantics error (though our
implementation doesn't and Android bionic doesn't use DT_RELACOUNT anyway.)
Move `llvm::partition` to a new function `partitionRels` and compute
`numRelativeRelocs` there. Now `RelocationBaseSection::addReloc` is trivial and
can be moved to the header to enable inlining.
The rest of DynamicReloc and `-z combreloc` handling is moved to the
non-template `RelocationBaseSection::computeRels` to decrease code size. My
x86-64 lld executable is 44+KiB smaller.
While here, rename `sort` to `combreloc`.
|
|
|
|
|
|
|
|
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
The previous land f860fe362282ed69b9d4503a20e5d20b9a041189 caused issues in https://lab.llvm.org/buildbot/#/builders/123/builds/8383, fixed by 22ee510dac9440a74b2e5b3fe3ff13ccdbf55af3.
Differential Revision: https://reviews.llvm.org/D108850
|
|
There is no remaining std::vector<InputSectionBase> now. My x86-64 lld
executable is 2KiB small.
|
|
strTabOffset stabilizes llvm::sort. My x86-64 executable is 5+KiB smaller.
|
|
After the D33630 fallout was properly fixed by a4c5db30be4e216834b44e31b47304ea1b92635f.
Tested by D37462/D44986 tests, the new --no-rosegment test in build-id.s, and a few --rosegment/--no-rosegment programs.
|
|
The new tests in build-id.s would catch problems if we made a mistake here.
|
|
|
|
redundant hash computation
5~6% speedup when linking clang and chrome.
|
|
It seems to be causing issues on https://lab.llvm.org/buildbot/#/builders/123/builds/8383
|
|
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
Differential Revision: https://reviews.llvm.org/D108850
|
|
Sorting dynamic relocations is a bottleneck. Simplifying the comparator improves
performance. Linking clang is 4~5% faster with --threads=8.
This change may shuffle R_MIPS_REL32 for Mips and is a NFC for non-Mips.
|
|
Switch to the D114180 approach which is simpler and allows gnuHashTab/hashTab to
switch to unique_ptr.
|
|
db08df0570b6dfaf00d7b1b8555c1d2d4effb224 does not work because part.relrDyn is
a unique_ptr and `reset` destroys the object which may still be referenced.
This commit uses the D114180 approach. Also improve the test to check that there
is no R_X86_64_RELATIVE.
|
|
We only support both TLSDESC and TLS GD for x86 so this is an x86-specific
problem. If both are used, only one R_X86_64_TLSDESC is produced and TLS GD
accesses will incorrectly reference R_X86_64_TLSDESC. Fix this by introducing
SymbolAux::tlsDescIdx.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D116900
|
|
to decrease sizeof(SymbolUnion) by 8 on ELF64 platforms.
Symbols needing such information are typically 1% or fewer (5134 out of 560520
when linking clang, 19898 out of 5550705 when linking chrome). Storing them
elsewhere can decrease memory usage and symbol initialization time.
There is a ~0.8% saving on max RSS when linking a large program.
Future direction:
* Move some of dynsymIndex/verdefIndex/versionId to SymbolAux
* Support mixed TLSDESC and TLS GD without increasing sizeof(SymbolUnion)
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D116281
|
|
and remove associated make<XXX> calls.
gnuHash and sysvHash are unchanged, otherwise LinkerScript::discard would
destroy the objects which may be referenced by input section descriptions.
My x86-64 lld executable is 121+KiB smaller.
|
|
|
|
|
|
This does resolve the redundancy in includeInDynsym().
|
|
This function may take ~1% time. SmallVector<SymbolTableEntry, 0> is smaller (16 bytes
instead of 24) and more efficient.
|
|
It is fairly easy to forget SectionBase::repl after ICF.
Let ICF rewrite a Defined symbol's `section` field to avoid references to
SectionBase::repl in subsequent passes. This slightly improves the --icf=none
performance due to less indirection (maybe for --icf={safe,all} as well if most
symbols are Defined).
With this change, there is only one reference to `repl` (--gdb-index D89751).
We can undo f4fb5fd7523f8e3c3b3966d43c0a28457b59d1d8 (`Move Repl to SectionBase.`)
but move `repl` to `InputSection` instead.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D116093
|
|
I added `PPC32Got2Section` D62464 to support .got2 but did not implement .got2
in another output section.
PR52799 has a linker script placing .got2 in .rodata, which causes a null
pointer dereference because a MergeSyntheticSection's file is nullptr.
Add the support.
|
|
associate make<XXX>"
This reverts commit e48b1c8a27f0fbd791edc8e45756b268caadfa66.
This reverts commit d019de23a1d761225fdaf0c47394ba58143aea9a.
The changes caused memory leaks (non-final classes cannot use unique_ptr).
|
|
|
|
See D116143 for benefits. My lld executable (x86-64) is 103+KiB smaller.
|
|
See D116143 for benefits. My lld executable (x86-64) is 24+KiB smaller.
|
|
|
|
|
|
This does not decrease sizeof(InputSection) (important for memory usage) on
ELF64 by itself but allows we to add another uint32_t.
|
|
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115993
|
|
With this patch, writing .debug_str is significantly for a program with
1.5G .debug_str:
* .debug_info 1.22s
* .debug_str 2.57s decreases to 0.66
|
|
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
|
|
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
|
|
This removes SpecificAlloc<Defined> and makes my lld executable 1.5k smaller.
This drops the small memory waste due to the separate BumpPtrAllocator.
|
|
scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
|
|
scanRelocations/postScanRelocations
May cause a failure for non-preemptible `bcmp` in a glibc -static link.
|
|
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D115603
|
|
An unstable sort suffices. In a large link (11.06s), this decreases .rela.dyn
writeTo time from 1.52s to 0.81s, resulting in 6% total time speedup (the
benefit will greatly dilute if --pack-dyn-relocs=relr becomes prevailing).
Encoding the dynamic relocations then sorting raw Elf_Rel/Elf_Rela doesn't seem
to improve much (doing that would require code duplicate because of
Elf_Rel/Elf_Rela plus unfortunate mips64le), so don't do that.
|
|
The canonical term is "extract" (GNU ld documentation, Solaris's `-z *extract`
options). Avoid inventing a term and match --why-extract. (ld64 prefers "load"
but the word is overloaded too much)
Mostly MFC, except for --help messages and the header row in
--print-archive-stats output.
|
|
BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).
Also, improve naming of some BaseCommand variables (base -> cmd).
|
|
This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.
The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.
|