Age | Commit message (Collapse) | Author | Files | Lines |
|
This changes how LLVM constructs certain data structures that relate to
exception handling (EH) on Windows. Specifically this changes how
IP2State tables for functions are constructed. The purpose of this
change is to align LLVM to the requires of the Windows AMD64 ABI, which
requires that the IP2State table entries point to the boundaries between
instructions.
On most Windows platforms (AMD64, ARM64, ARM32, IA64, but *not* x86-32),
exception handling works by looking up instruction pointers in lookup
tables. These lookup tables are stored in `.xdata` sections in
executables. One element of the lookup tables are the `IP2State` tables
(Instruction Pointer to State).
If a function has any instructions that require cleanup during exception
unwinding, then it will have an IP2State table. Each entry in the
IP2State table describes a range of bytes in the function's instruction
stream, and associates an "EH state number" with that range of
instructions. A value of -1 means "the null state", which does not
require any code to execute. A value other than -1 is an index into the
State table.
The entries in the IP2State table contain byte offsets within the
instruction stream of the function. The Windows ABI requires that these
offsets are aligned to instruction boundaries; they are not permitted to
point to a byte that is not the first byte of an instruction.
Unfortunately, CALL instructions present a problem during unwinding.
CALL instructions push the address of the instruction after the CALL
instruction, so that execution can resume after the CALL. If the CALL is
the last instruction within an IP2State region, then the return address
(on the stack) points to the *next* IP2State region. This means that the
unwinder will use the wrong cleanup funclet during unwinding.
To fix this problem, compilers should insert a NOP after a CALL
instruction, if the CALL instruction is the last instruction within an
IP2State region. The NOP is placed within the same IP2State region as
the CALL, so that the return address points to the NOP and the unwinder
will locate the correct region.
This PR modifies LLVM so that it inserts NOP instructions after CALL
instructions, when needed. In performance tests, the NOP has no
detectable significance. The NOP is rarely inserted, since it is only
inserted when the CALL is the last instruction before an IP2State
transition or the CALL is the last instruction before the function
epilogue.
NOP padding is only necessary on Windows AMD64 targets. On ARM64 and
ARM32, instructions have a fixed size so the unwinder knows how to "back
up" by one instruction.
Interaction with Import Call Optimization (ICO):
Import Call Optimization (ICO) is a compiler + OS feature on Windows
which improves the performance and security of DLL imports. ICO relies
on using a specific CALL idiom that can be replaced by the OS DLL
loader. This removes a load and indirect CALL and replaces it with a
single direct CALL.
To achieve this, ICO also inserts NOPs after the CALL instruction. If
the end of the CALL is aligned with an EH state transition, we *also*
insert a single-byte NOP. **Both forms of NOPs must be preserved.** They
cannot be combined into a single larger NOP; nor can the second NOP be
removed.
This is necessary because, if ICO is active and the call site is
modified by the loader, the loader will end up overwriting the NOPs that
were inserted for ICO. That means that those NOPs cannot be used for the
correct termination of the exception handling region (the IP2State
transition), so we still need an additional NOP instruction. The NOPs
cannot be combined into a longer NOP (which is ordinarily desirable)
because then ICO would split one instruction, producing a malformed
instruction after the ICO call.
|
|
to align with ELF targets, where the relocation specifier constants are
all target-specific.
|
|
Rename these relocation specifier constants, aligning with the naming
convention used by other targets (`S_` instead of `VK_`).
Move constants to X86MCAsmInfo.h, with the goal of eventually removing
X86MCExpr.h.
Similar to #144633 for AArch64.
|
|
to MSVC /d2guardretpoline) (#126631)
This is the x64 equivalent of #121516
Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.
As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.
The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.
|
|
MachineInstr fixup and printing code. (#137331)
When -use-constant-{int,fp}-for-fixed-length-splat are enabled, constant
vector splats take the form of ConstantInt/FP instead of ConstantVector.
These constants get linked to MachineInstrs via constant pools for later
processing. The processing assumes ConstantInt/FP to always represent
scalar constants with this PR extending the code to support vector
types.
NOTE: The test choices are somewhat artificial because pretty much all
the vector tests failed without these changes when the new constants are
enabled.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
(equivalent to MSVC /d2epilogunwind) (#129142)
Adds support for emitting Windows x64 Unwind V2 information, includes
support `/d2epilogunwind` in clang-cl.
Unwind v2 adds information about the epilogs in functions such that the
unwinder can unwind even in the middle of an epilog, without having to
disassembly the function to see what has or has not been cleaned up.
Unwind v2 requires that all epilogs are in "canonical" form:
* If there was a stack allocation (fixed or dynamic) in the prolog, then
the first instruction in the epilog must be a stack deallocation.
* Next, for each `PUSH` in the prolog there must be a corresponding
`POP` instruction in exact reverse order.
* Finally, the epilog must end with the terminator.
This change adds a pass to validate epilogs in modules that have Unwind
v2 enabled and, if they pass, emits new pseudo instructions to MC that
1) note that the function is using unwind v2 and 2) mark the start of
the epilog (this is either the first `POP` if there is one, otherwise
the terminator instruction). If a function does not meet these
requirements, it is downgraded to Unwind v1 (i.e., these new pseudo
instructions are not emitted).
Note that the unwind v2 table only marks the size of the epilog in the
"header" unwind code, but it's possible for epilogs to use different
terminator instructions thus they are not all the same size. As a work
around for this, MC will assume that all terminator instructions are
1-byte long - this still works correctly with the Windows unwinder as it
is only using the size to do a range check to see if a thread is in an
epilog or not, and since the instruction pointer will never be in the
middle of an instruction and the terminator is always at the end of an
epilog the range check will function correctly. This does mean, however,
that the "at end" optimization (where an epilog unwind code can be
elided if the last epilog is at the end of the function) can only be
used if the terminator is 1-byte long.
One other complication with the implementation is that the unwind table
for a function is emitted during streaming, however we can't calculate
the distance between an epilog and the end of the function at that time
as layout hasn't been completed yet (thus some instructions may be
relaxed). To work around this, epilog unwind codes are emitted via a
fixup. This also means that we can't pre-emptively downgrade a function
to Unwind v1 if one of these offsets is too large, so instead we raise
an error (but I've passed through the location information, so the user
will know which of their functions is problematic).
|
|
The Triple and SubTarget API functions isOsWindowsOrUEFI is not
preferred. Dropping them.
|
|
same as AsmPrinter.OutContext (#133352)
In `X86MCInstLower::LowerMachineOperand`, a new `MCSymbol` can be
created in `GetSymbolFromOperand(MO)` where `MO.getType()` is
`MachineOperand::MO_ExternalSymbol`
```
case MachineOperand::MO_ExternalSymbol:
return LowerSymbolOperand(MO, GetSymbolFromOperand(MO));
```
at
https://github.com/llvm/llvm-project/blob/725a7b664b92cd2e884806de5a08900b43d43cce/llvm/lib/Target/X86/X86MCInstLower.cpp#L196
However, this newly created symbol will not be marked properly with its
`IsExternal` field since `Ctx.getOrCreateSymbol(Name)` doesn't know if
the newly created `MCSymbol` is for `MachineOperand::MO_ExternalSymbol`.
Looking at other backends, for example `Arch64MCInstLower` is doing for
handling `MC_ExternalSymbol`
https://github.com/llvm/llvm-project/blob/14c36db16fc090ef494ff6d8207562c414b40e30/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp#L366-L367
https://github.com/llvm/llvm-project/blob/14c36db16fc090ef494ff6d8207562c414b40e30/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp#L145-L148
It creates/gets the MCSymbol from `AsmPrinter.OutContext` instead of
from `Ctx`. Moreover, `Ctx` for `AArch64MCLower` is the same as
`AsmPrinter.OutContext`.
https://github.com/llvm/llvm-project/blob/8e7d6baf0e013408be932758b4a5334c14a34086/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L100.
This applies to almost all the other backends except X86 and M68k.
```
$git grep "MCInstLowering("
lib/Target/AArch64/AArch64AsmPrinter.cpp:100: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this),
lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:223: AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);
lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:257: AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);
lib/Target/AMDGPU/R600MCInstLower.cpp:52: R600MCInstLower MCInstLowering(OutContext, STI, *this);
lib/Target/ARC/ARCAsmPrinter.cpp:41: MCInstLowering(&OutContext, *this) {}
lib/Target/AVR/AVRAsmPrinter.cpp:196: AVRMCInstLower MCInstLowering(OutContext, *this);
lib/Target/BPF/BPFAsmPrinter.cpp:144: BPFMCInstLower MCInstLowering(OutContext, *this);
lib/Target/CSKY/CSKYAsmPrinter.cpp:41: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this) {}
lib/Target/Lanai/LanaiAsmPrinter.cpp:147: LanaiMCInstLower MCInstLowering(OutContext, *this);
lib/Target/Lanai/LanaiAsmPrinter.cpp:184: LanaiMCInstLower MCInstLowering(OutContext, *this);
lib/Target/MSP430/MSP430AsmPrinter.cpp:149: MSP430MCInstLower MCInstLowering(OutContext, *this);
lib/Target/Mips/MipsAsmPrinter.h:126: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(*this) {}
lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp:695: WebAssemblyMCInstLower MCInstLowering(OutContext, *this);
lib/Target/X86/X86MCInstLower.cpp:2200: X86MCInstLower MCInstLowering(*MF, *this);
```
This patch makes `X86MCInstLower` and `M68KInstLower` to have their
`Ctx` from `AsmPrinter.OutContext` instead of getting it from
`MF.getContext()` to be consistent with all the other backends.
I think since normal use case (probably anything other than our
un-conventional case) only handles one llvm module all the way through
in the codegen pipeline till the end of code emission (AsmPrint),
`AsmPrinter.OutContext` is the same as MachineFunction's MCContext, so
this change is an NFC.
----
This fixes an error while running the generated code in ORC JIT for our
use case with
[MCLinker](https://youtu.be/yuSBEXkjfEA?si=HjgjkxJ9hLfnSvBj&t=813) (see
more details below):
https://github.com/llvm/llvm-project/pull/133291#issuecomment-2759200983
We (Mojo) are trying to do a MC level linking so that we break llvm
module into multiple submodules to compile and codegen in parallel
(technically into *.o files with symbol linkage type change), but
instead of archive all of them into one `.a` file, we want to fix the
symbol linkage type and still produce one *.o file. The parallel codegen
pipeline generates the codegen data structures in their own `MCContext`
(which is `Ctx` here). So if function `f` and `g` got split into
different submodules, they will have different `Ctx`. And when we try to
create an external symbol with the same name for each of them with
`Ctx.getOrCreate(SymName)`, we will get two different `MCSymbol*`
because `f` and `g`'s `MCContext` are different and they can't see each
other. This is unfortunately not what we want for external symbols.
Using `AsmPrinter.OutContext` helps, since it is shared, if we try to
get or create the `MCSymbol` there, we'll be able to deduplicate.
|
|
|
|
Reverts llvm/llvm-project#108880 .
The patch has no regression test, no description of why the fix is
necessary, and the code is modifying MC datastructures in a way that's
forbidden in the AsmPrinter.
Fixes #132055.
|
|
The test file is over 4GiB, which is too big, so I didn’t submit it.
|
|
Move target-specific members outside of MCSymbolRefExpr::VariantKind
(a legacy interface I am eliminating). Most changes are mechanic,
except:
* ELFObjectWriter::shouldRelocateWithSymbol
* The legacy generic code uses `ELFObjectWriter::fixSymbolsInTLSFixups`
to set `STT_TLS` (and use an unnecessary expression walk). The better
way is to do this in `getRelocType`, which I have done for
AArch64, PowerPC, and RISC-V.
In the future, we should encode expressions with a relocation specifier
as X86MCExpr and use MCValue::RefKind to hold the specifier of the
relocatable expression.
https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
While here, rename "Modifier' to "Specifier":
> "Relocation modifier", though concise, suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation. I landed on "relocation specifier" as the winner. It's clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly.
Pull Request: https://github.com/llvm/llvm-project/pull/132149
|
|
data is smaller than the printed data
Bail out if the constant types aren't compatible
Fixes #131389
|
|
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
|
|
|
|
Windows x64 Unwind V2 adds epilog information to unwind data:
specifically, the length of the epilog and the offset of each epilog.
The first step to do this is to add markers to the beginning and end of
each epilog when generating Windows x64 code. I've modelled this after
how LLVM was marking ARM and AArch64 epilogs in Windows (and unified the
code between the three).
|
|
The check for `isOSWindows() || isUEFI()` is used in several places
across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h
to simplify these checks.
|
|
Identified with misc-include-cleaner.
|
|
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
|
|
- [x] Mark `MCSymbol` for `MO_ExternalSymbol` to be external when
created.
|
|
This patch makes the `VBROADCAST***X**` subvector broadcast instructions consistent - the `***X**` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).
|
|
|
|
For more details about this feature, please refer to latest Intel 64 and
IA-32 Architectures Optimization Reference Manual Volume 1:
https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
|
|
This caused the MCOperand to be returned in memory. An MCOperand is only
16 bytes and therefore can be returned in registers on x86-64 and
AArch64 (and others).
|
|
(#95403)
As discussed on #90748 - we can avoid unpacks/extensions from vXi8 to vXi16 by using PMADDUBSW instead and packing the vXi16 results back together.
|
|
Based on feedback from #95403 - we use multiply by constant for various lowerings (shifts, division etc.), so its very useful to printout the constants to help understand the transform involved.
vXi16 multiplies are the easiest to add for this initial commit, but we can add other arithmetic instructions as follow ups when the need arises (I intend to add PMADDUBSW handling for #95403 next).
I've done my best to update all test checks but there are bound to be ones that got missed that will only appear when the file is regenerated.
|
|
xray instruments tail call function exits by inserting a nop sled before
the tail call. When tracing is enabled, the nop sled is replaced with a
call to `__xray_FunctionTailExit()`. This currently does not work for
conditional tail calls, as the instrumentation assumes that the tail
call will be unconditional. This causes two issues:
- `__xray_FunctionTailExit()` is inappropately called even when the
tail call is not taken.
- `__xray_FunctionTailExit()`'s prologue/epilogue adjusts the stack
pointer with add/sub instructions. This clobbers condition flags,
which can flip the condition used for the tail call, leading to
incorrect program behavior.
Fix this by rewriting conditional calls when lowering patchable tail
calls.
With this change, a conditional patchable tail call like:
```
je target
```
Will be lowered to:
```
jne .fallthrough
.p2align 1, ..
.Lxray_sled_N:
SLED_CODE
jmp target
.fallthrough:
```
|
|
Fixes #82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
|
|
This fixes an edge case where functions starting with inline assembly
would assert while trying to lower that inline asm instruction.
After this PR, for now we always add a no-op (xchgw in this case) without
considering the size of the next inline asm instruction. We might want
to revisit this in the future.
This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH.
Should close https://github.com/llvm/llvm-project/issues/56234
|
|
|
|
TargetOptions/MCAsmInfo to MCTargetOptions
The convention is for such MC-specific options to reside in
MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do
not follow the convention: `CompressDebugSections` is defined in both
TargetOptions and MCAsmInfo and there is forwarding complexity.
Move the option to MCTargetOptions and hereby simplify the code. Rename
the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc
-relax-relocations and llc -x86-relax-relocations can now be unified.
|
|
Replaces internal helper used by addConstantComments to allow reuse in a future patch.
|
|
|
|
Handle masked predicated movss/movsd in addConstantComments now that we can generically handle the destination + mask register
This will more significantly help improve 'fixup constant' comments from #73509
|
|
Handle masked predicated load/broadcasts in addConstantComments now that we can generically handle the destination + mask register
This will more significantly help improve 'fixup constant' comments from #73509
|
|
Remove handling from EmitAnyX86InstComments and handle all VPMOVSX/VPMOVZX comments in addConstantComments now that we can generically handle the destination + mask register and shuffle mask comment
|
|
printDstRegisterName helpers. NFC.
This will allow us to easily use printDstRegisterName for other mask predicate destination registers, and printout shuffle masks from other instruction types.
|
|
predicates instead of src index offsets.
|
|
repeated switch cases etc. NFC.
|
|
masked predicates. NFC.
|
|
stored in a truncated form (#80428)
Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.
|
|
llvm-project/llvm/lib/Target/X86/X86MCInstLower.cpp:1588:48:
error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
if (C && C->getType()->getScalarSizeInBits() == SrcEltBits) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~
1 error generated.
|
|
be stored in a truncated form (#79815)
Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load.
I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips.
I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.
|
|
This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.
|
|
Previously, tail jump pseudo-opcodes were skipped by the
`encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`.
This caused emission of a 2-byte NOP and dropping of the tail jump.
With this PR, we change `PATCHABLE_OP` to not wrap the first
`MachineInstr` anymore, but inserting itself before,
leaving the instruction unaltered. At lowering time in `X86AsmPrinter`,
we now "look ahead" for the next non-pseudo `MachineInstr` and
lower+encode it, to inspect its size. If the size is below what
`PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing. That
way, now the first `MachineInstr` is always lowered as usual even if
`"patchable-function"="prologue-short-redirect"` is used.
Fixes https://github.com/llvm/llvm-project/issues/76879,
https://github.com/llvm/llvm-project/issues/76958 and
https://github.com/llvm/llvm-project/issues/59039
|
|
|
|
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
|
|
Displacement MachineOperand
This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
|
|
Pull out helper instead of repeating switch cases.
|
|
Pull out helper instead of repeating switch cases.
|