aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/X86/X86MCInstLower.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-07-22Fix Windows EH IP2State tables (remove +1 bias) (#144745)sivadeilra1-35/+173
This changes how LLVM constructs certain data structures that relate to exception handling (EH) on Windows. Specifically this changes how IP2State tables for functions are constructed. The purpose of this change is to align LLVM to the requires of the Windows AMD64 ABI, which requires that the IP2State table entries point to the boundaries between instructions. On most Windows platforms (AMD64, ARM64, ARM32, IA64, but *not* x86-32), exception handling works by looking up instruction pointers in lookup tables. These lookup tables are stored in `.xdata` sections in executables. One element of the lookup tables are the `IP2State` tables (Instruction Pointer to State). If a function has any instructions that require cleanup during exception unwinding, then it will have an IP2State table. Each entry in the IP2State table describes a range of bytes in the function's instruction stream, and associates an "EH state number" with that range of instructions. A value of -1 means "the null state", which does not require any code to execute. A value other than -1 is an index into the State table. The entries in the IP2State table contain byte offsets within the instruction stream of the function. The Windows ABI requires that these offsets are aligned to instruction boundaries; they are not permitted to point to a byte that is not the first byte of an instruction. Unfortunately, CALL instructions present a problem during unwinding. CALL instructions push the address of the instruction after the CALL instruction, so that execution can resume after the CALL. If the CALL is the last instruction within an IP2State region, then the return address (on the stack) points to the *next* IP2State region. This means that the unwinder will use the wrong cleanup funclet during unwinding. To fix this problem, compilers should insert a NOP after a CALL instruction, if the CALL instruction is the last instruction within an IP2State region. The NOP is placed within the same IP2State region as the CALL, so that the return address points to the NOP and the unwinder will locate the correct region. This PR modifies LLVM so that it inserts NOP instructions after CALL instructions, when needed. In performance tests, the NOP has no detectable significance. The NOP is rarely inserted, since it is only inserted when the CALL is the last instruction before an IP2State transition or the CALL is the last instruction before the function epilogue. NOP padding is only necessary on Windows AMD64 targets. On ARM64 and ARM32, instructions have a fixed size so the unwinder knows how to "back up" by one instruction. Interaction with Import Call Optimization (ICO): Import Call Optimization (ICO) is a compiler + OS feature on Windows which improves the performance and security of DLL imports. ICO relies on using a specific CALL idiom that can be replaced by the OS DLL loader. This removes a load and indirect CALL and replaces it with a single direct CALL. To achieve this, ICO also inserts NOPs after the CALL instruction. If the end of the CALL is aligned with an EH state transition, we *also* insert a single-byte NOP. **Both forms of NOPs must be preserved.** They cannot be combined into a single larger NOP; nor can the second NOP be removed. This is necessary because, if ICO is active and the call site is modified by the loader, the loader will end up overwriting the NOPs that were inserted for ICO. That means that those NOPs cannot be used for the correct termination of the exception handling region (the IP2State transition), so we still need an additional NOP instruction. The NOPs cannot be combined into a longer NOP (which is ordinarily desirable) because then ICO would split one instruction, producing a malformed instruction after the ICO call.
2025-06-27MCExpr: Make COFF-specific VK_SECREL target-specificFangrui Song1-1/+1
to align with ELF targets, where the relocation specifier constants are all target-specific.
2025-06-27X86: Rename X86MCExpr::VK_ to X86::S_Fangrui Song1-39/+38
Rename these relocation specifier constants, aligning with the naming convention used by other targets (`S_` instead of `VK_`). Move constants to X86MCAsmInfo.h, with the goal of eventually removing X86MCExpr.h. Similar to #144633 for AArch64.
2025-05-20[x64][win] Add compiler support for x64 import call optimization (equivalent ↵Daniel Paoliello1-15/+155
to MSVC /d2guardretpoline) (#126631) This is the x64 equivalent of #121516 Since import call optimization was originally [added to x64 Windows to implement a more efficient retpoline mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618) the section and constant names relating to this all mention "retpoline" and we need to mark indirect calls, control-flow guard calls and jumps for jump tables in the section alongside calls to imported functions. As with the AArch64 feature, this emits a new section into the obj which is used by the MSVC linker to generate the Dynamic Value Relocation Table and the section itself does not appear in the final binary. The Windows Loader requires a specific sequence of instructions be emitted when this feature is enabled: * Indirect calls/jumps must have the function pointer to jump to in `rax`. * Calls to imported functions must use the `rex` prefix and be followed by a 5-byte nop. * Indirect calls must be followed by a 3-byte nop.
2025-05-13[NFC][LLVM][CodeGen][X86] Add ConstantInt/FP based vector support to ↵Paul Walker1-2/+16
MachineInstr fixup and printing code. (#137331) When -use-constant-{int,fp}-for-fixed-length-splat are enabled, constant vector splats take the form of ConstantInt/FP instead of ConstantVector. These constants get linked to MachineInstrs via constant pools for later processing. The processing assumes ConstantInt/FP to always represent scalar constants with this PR extending the code to support vector types. NOTE: The test choices are somewhat artificial because pretty much all the vector tests failed without these changes when the new constants are enabled. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-09[win][x64] Unwind v2 3/n: Add support for emitting unwind v2 information ↵Daniel Paoliello1-0/+10
(equivalent to MSVC /d2epilogunwind) (#129142) Adds support for emitting Windows x64 Unwind V2 information, includes support `/d2epilogunwind` in clang-cl. Unwind v2 adds information about the epilogs in functions such that the unwinder can unwind even in the middle of an epilog, without having to disassembly the function to see what has or has not been cleaned up. Unwind v2 requires that all epilogs are in "canonical" form: * If there was a stack allocation (fixed or dynamic) in the prolog, then the first instruction in the epilog must be a stack deallocation. * Next, for each `PUSH` in the prolog there must be a corresponding `POP` instruction in exact reverse order. * Finally, the epilog must end with the terminator. This change adds a pass to validate epilogs in modules that have Unwind v2 enabled and, if they pass, emits new pseudo instructions to MC that 1) note that the function is using unwind v2 and 2) mark the start of the epilog (this is either the first `POP` if there is one, otherwise the terminator instruction). If a function does not meet these requirements, it is downgraded to Unwind v1 (i.e., these new pseudo instructions are not emitted). Note that the unwind v2 table only marks the size of the epilog in the "header" unwind code, but it's possible for epilogs to use different terminator instructions thus they are not all the same size. As a work around for this, MC will assume that all terminator instructions are 1-byte long - this still works correctly with the Windows unwinder as it is only using the size to do a range check to see if a thread is in an epilog or not, and since the instruction pointer will never be in the middle of an instruction and the terminator is always at the end of an epilog the range check will function correctly. This does mean, however, that the "at end" optimization (where an epilog unwind code can be elided if the last epilog is at the end of the function) can only be used if the terminator is 1-byte long. One other complication with the implementation is that the unwind table for a function is emitted during streaming, however we can't calculate the distance between an epilog and the end of the function at that time as layout hasn't been completed yet (thus some instructions may be relaxed). To work around this, epilog unwind codes are emitted via a fixup. This also means that we can't pre-emptively downgrade a function to Unwind v1 if one of these offsets is too large, so instead we raise an error (but I've passed through the location information, so the user will know which of their functions is problematic).
2025-05-06[NFC][llvm] Drop isOsWindowsOrUEFI API (#138733)Prabhu Rajasekaran1-1/+1
The Triple and SubTarget API functions isOsWindowsOrUEFI is not preferred. Dropping them.
2025-04-04[X86Backend][M68KBackend] Make Ctx in X86MCInstLower (M68KInstLower) the ↵weiwei chen1-2/+2
same as AsmPrinter.OutContext (#133352) In `X86MCInstLower::LowerMachineOperand`, a new `MCSymbol` can be created in `GetSymbolFromOperand(MO)` where `MO.getType()` is `MachineOperand::MO_ExternalSymbol` ``` case MachineOperand::MO_ExternalSymbol: return LowerSymbolOperand(MO, GetSymbolFromOperand(MO)); ``` at https://github.com/llvm/llvm-project/blob/725a7b664b92cd2e884806de5a08900b43d43cce/llvm/lib/Target/X86/X86MCInstLower.cpp#L196 However, this newly created symbol will not be marked properly with its `IsExternal` field since `Ctx.getOrCreateSymbol(Name)` doesn't know if the newly created `MCSymbol` is for `MachineOperand::MO_ExternalSymbol`. Looking at other backends, for example `Arch64MCInstLower` is doing for handling `MC_ExternalSymbol` https://github.com/llvm/llvm-project/blob/14c36db16fc090ef494ff6d8207562c414b40e30/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp#L366-L367 https://github.com/llvm/llvm-project/blob/14c36db16fc090ef494ff6d8207562c414b40e30/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp#L145-L148 It creates/gets the MCSymbol from `AsmPrinter.OutContext` instead of from `Ctx`. Moreover, `Ctx` for `AArch64MCLower` is the same as `AsmPrinter.OutContext`. https://github.com/llvm/llvm-project/blob/8e7d6baf0e013408be932758b4a5334c14a34086/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L100. This applies to almost all the other backends except X86 and M68k. ``` $git grep "MCInstLowering(" lib/Target/AArch64/AArch64AsmPrinter.cpp:100: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this), lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:223: AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this); lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:257: AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this); lib/Target/AMDGPU/R600MCInstLower.cpp:52: R600MCInstLower MCInstLowering(OutContext, STI, *this); lib/Target/ARC/ARCAsmPrinter.cpp:41: MCInstLowering(&OutContext, *this) {} lib/Target/AVR/AVRAsmPrinter.cpp:196: AVRMCInstLower MCInstLowering(OutContext, *this); lib/Target/BPF/BPFAsmPrinter.cpp:144: BPFMCInstLower MCInstLowering(OutContext, *this); lib/Target/CSKY/CSKYAsmPrinter.cpp:41: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this) {} lib/Target/Lanai/LanaiAsmPrinter.cpp:147: LanaiMCInstLower MCInstLowering(OutContext, *this); lib/Target/Lanai/LanaiAsmPrinter.cpp:184: LanaiMCInstLower MCInstLowering(OutContext, *this); lib/Target/MSP430/MSP430AsmPrinter.cpp:149: MSP430MCInstLower MCInstLowering(OutContext, *this); lib/Target/Mips/MipsAsmPrinter.h:126: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(*this) {} lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp:695: WebAssemblyMCInstLower MCInstLowering(OutContext, *this); lib/Target/X86/X86MCInstLower.cpp:2200: X86MCInstLower MCInstLowering(*MF, *this); ``` This patch makes `X86MCInstLower` and `M68KInstLower` to have their `Ctx` from `AsmPrinter.OutContext` instead of getting it from `MF.getContext()` to be consistent with all the other backends. I think since normal use case (probably anything other than our un-conventional case) only handles one llvm module all the way through in the codegen pipeline till the end of code emission (AsmPrint), `AsmPrinter.OutContext` is the same as MachineFunction's MCContext, so this change is an NFC. ---- This fixes an error while running the generated code in ORC JIT for our use case with [MCLinker](https://youtu.be/yuSBEXkjfEA?si=HjgjkxJ9hLfnSvBj&t=813) (see more details below): https://github.com/llvm/llvm-project/pull/133291#issuecomment-2759200983 We (Mojo) are trying to do a MC level linking so that we break llvm module into multiple submodules to compile and codegen in parallel (technically into *.o files with symbol linkage type change), but instead of archive all of them into one `.a` file, we want to fix the symbol linkage type and still produce one *.o file. The parallel codegen pipeline generates the codegen data structures in their own `MCContext` (which is `Ctx` here). So if function `f` and `g` got split into different submodules, they will have different `Ctx`. And when we try to create an external symbol with the same name for each of them with `Ctx.getOrCreate(SymName)`, we will get two different `MCSymbol*` because `f` and `g`'s `MCContext` are different and they can't see each other. This is unfortunately not what we want for external symbols. Using `AsmPrinter.OutContext` helps, since it is shared, if we try to get or create the `MCSymbol` there, we'll be able to deduplicate.
2025-03-29[X86] Use MCRegister. NFCCraig Topper1-1/+1
2025-03-27Revert "[MC] Explicitly mark MCSymbol for MO_ExternalSymbol" (#133291)Eli Friedman1-5/+1
Reverts llvm/llvm-project#108880 . The patch has no regression test, no description of why the fix is necessary, and the code is modifying MC datastructures in a way that's forbidden in the AsmPrinter. Fixes #132055.
2025-03-21[llvm:ir] Add support for constant data exceeding 4GiB (#126481)pzzp1-1/+1
The test file is over 4GiB, which is too big, so I didn’t submit it.
2025-03-20Move X86-specific MCSymbolRefExpr::VariantKind to X86MCExpr::SpecifierFangrui Song1-41/+40
Move target-specific members outside of MCSymbolRefExpr::VariantKind (a legacy interface I am eliminating). Most changes are mechanic, except: * ELFObjectWriter::shouldRelocateWithSymbol * The legacy generic code uses `ELFObjectWriter::fixSymbolsInTLSFixups` to set `STT_TLS` (and use an unnecessary expression walk). The better way is to do this in `getRelocType`, which I have done for AArch64, PowerPC, and RISC-V. In the future, we should encode expressions with a relocation specifier as X86MCExpr and use MCValue::RefKind to hold the specifier of the relocatable expression. https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers While here, rename "Modifier' to "Specifier": > "Relocation modifier", though concise, suggests adjustments happen during the linker's relocation step rather than the assembler's expression evaluation. I landed on "relocation specifier" as the winner. It's clear, aligns with Arm and IBM’s usage, and fits the assembler's role seamlessly. Pull Request: https://github.com/llvm/llvm-project/pull/132149
2025-03-17[X86] X86MCInstLower.cpp - printConstant - don't assume the source constant ↵Simon Pilgrim1-15/+21
data is smaller than the printed data Bail out if the constant types aren't compatible Fixes #131389
2025-03-06[IR] Store Triple in Module (NFC) (#129868)Nikita Popov1-3/+2
The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.
2025-03-05[MC] Remove unneeded VK_None argument from MCSymbolRefExpr::create. NFCFangrui Song1-2/+1
2025-01-30[llvm] Win x64 Unwind V2 1/n: Mark beginning and end of epilogs (#110024)Daniel Paoliello1-1/+17
Windows x64 Unwind V2 adds epilog information to unwind data: specifically, the length of the epilog and the offset of each epilog. The first step to do this is to add markers to the beginning and end of each epilog when generating Windows x64 code. I've modelled this after how LLVM was marking ARM and AArch64 epilogs in Windows (and unified the code between the three).
2025-01-28[nfc][llvm] Clean up isUEFI checks (#124845)Prabhuk1-1/+1
The check for `isOSWindows() || isUEFI()` is used in several places across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h to simplify these checks.
2024-11-09[X86] Remove unused includes (NFC) (#115593)Kazu Hirata1-2/+0
Identified with misc-include-cleaner.
2024-09-20[llvm] Don't call raw_string_ostream::flush() (NFC)Youngsuk Kim1-1/+0
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-09-20[MC] Explicitly mark MCSymbol for MO_ExternalSymbol (#108880)weiwei chen1-1/+5
- [x] Mark `MCSymbol` for `MO_ExternalSymbol` to be external when created.
2024-09-18[X86] Cleanup AVX512 VBROADCAST subvector instruction names. (#108888)Simon Pilgrim1-10/+10
This patch makes the `VBROADCAST***X**` subvector broadcast instructions consistent - the `***X**` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).
2024-07-20X86: Avoid using MachineFunction::getMMIMatt Arsenault1-2/+2
2024-07-08[X86] Support branch hint (#97721)Feng Zou1-0/+24
For more details about this feature, please refer to latest Intel 64 and IA-32 Architectures Optimization Reference Manual Volume 1: https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
2024-06-22[X86][MC] Drop optional from LowerMachineOperand (#96338)Alexis Engelke1-21/+22
This caused the MCOperand to be returned in memory. An MCOperand is only 16 bytes and therefore can be returned in registers on x86-64 and AArch64 (and others).
2024-06-15[X86] Lower vXi8 multiplies by constant using PMADDUBSW on SSSE3+ targets ↵Simon Pilgrim1-0/+17
(#95403) As discussed on #90748 - we can avoid unpacks/extensions from vXi8 to vXi16 by using PMADDUBSW instead and packing the vXi16 results back together.
2024-06-14[MC][X86] addConstantComments - add mul vXi16 commentsSimon Pilgrim1-0/+39
Based on feedback from #95403 - we use multiply by constant for various lowerings (shifts, division etc.), so its very useful to printout the constants to help understand the transform involved. vXi16 multiplies are the easiest to add for this initial commit, but we can add other arithmetic instructions as follow ups when the need arises (I intend to add PMADDUBSW handling for #95403 next). I've done my best to update all test checks but there are bound to be ones that got missed that will only appear when the file is regenerated.
2024-05-27[XRay][X86] Handle conditional calls when lowering patchable tail calls (#89364)Ricky Zhou1-6/+34
xray instruments tail call function exits by inserting a nop sled before the tail call. When tracing is enabled, the nop sled is replaced with a call to `__xray_FunctionTailExit()`. This currently does not work for conditional tail calls, as the instrumentation assumes that the tail call will be unconditional. This causes two issues: - `__xray_FunctionTailExit()` is inappropately called even when the tail call is not taken. - `__xray_FunctionTailExit()`'s prologue/epilogue adjusts the stack pointer with add/sub instructions. This clobbers condition flags, which can flip the condition used for the tail call, leading to incorrect program behavior. Fix this by rewriting conditional calls when lowering patchable tail calls. With this change, a conditional patchable tail call like: ``` je target ``` Will be lowered to: ``` jne .fallthrough .p2align 1, .. .Lxray_sled_N: SLED_CODE jmp target .fallthrough: ```
2024-04-24[CodeGen] Make the parameter TRI required in some functions. (#85968)Xu Zhang1-1/+2
Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-08[Codegen][X86] Fix /HOTPATCH with clang-cl and inline asm (#87639)Alexandre Ganea1-1/+3
This fixes an edge case where functions starting with inline assembly would assert while trying to lower that inline asm instruction. After this PR, for now we always add a no-op (xchgw in this case) without considering the size of the next inline asm instruction. We might want to revisit this in the future. This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH. Should close https://github.com/llvm/llvm-project/issues/56234
2024-03-15[X86] Add Support for X86 TLSDESC Relocations (#83136)Phoebe Wang1-5/+28
2024-03-06[MC] Move CompressDebugSections/RelaxELFRelocations from ↵Fangrui Song1-3/+3
TargetOptions/MCAsmInfo to MCTargetOptions The convention is for such MC-specific options to reside in MCTargetOptions. However, CompressDebugSections/RelaxELFRelocations do not follow the convention: `CompressDebugSections` is defined in both TargetOptions and MCAsmInfo and there is forwarding complexity. Move the option to MCTargetOptions and hereby simplify the code. Rename the misleading RelaxELFRelocations to X86RelaxRelocations. llvm-mc -relax-relocations and llc -x86-relax-relocations can now be unified.
2024-02-08[X86] Add X86::getVectorRegisterWidth helper. NFC.Simon Pilgrim1-18/+6
Replaces internal helper used by addConstantComments to allow reuse in a future patch.
2024-02-05[X86] addConstantComments - add FP16 MOVSH asm comments supportSimon Pilgrim1-0/+6
2024-02-05[X86] printZeroUpperMove - add support for mask predicated instructionsSimon Pilgrim1-5/+7
Handle masked predicated movss/movsd in addConstantComments now that we can generically handle the destination + mask register This will more significantly help improve 'fixup constant' comments from #73509
2024-02-05[X86] printBroadcast - add support for mask predicated instructionsSimon Pilgrim1-48/+59
Handle masked predicated load/broadcasts in addConstantComments now that we can generically handle the destination + mask register This will more significantly help improve 'fixup constant' comments from #73509
2024-02-05[X86] printExtend - add support for mask predicated instructionsSimon Pilgrim1-28/+26
Remove handling from EmitAnyX86InstComments and handle all VPMOVSX/VPMOVZX comments in addConstantComments now that we can generically handle the destination + mask register and shuffle mask comment
2024-02-05[X86] Split up getShuffleComment into printShuffleMask and ↵Simon Pilgrim1-34/+35
printDstRegisterName helpers. NFC. This will allow us to easily use printDstRegisterName for other mask predicate destination registers, and printout shuffle masks from other instruction types.
2024-02-05[X86] getShuffleComment - use MI description to determine AVX512 masked ↵Simon Pilgrim1-9/+4
predicates instead of src index offsets.
2024-02-05[X86] addConstantComments - split VPERMILPS/VPERMILPD handling to reduce ↵Simon Pilgrim1-33/+12
repeated switch cases etc. NFC.
2024-02-05[X86] Add common getSrcIdx helper to determine source index after AVX512 ↵Simon Pilgrim1-20/+14
masked predicates. NFC.
2024-02-05[X86] X86FixupVectorConstants - load+zero vector constants that can be ↵Simon Pilgrim1-15/+65
stored in a truncated form (#80428) Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.
2024-02-02[X86] Fix -Wsign-compare in X86MCInstLower.cpp (NFC)Jie Fu1-1/+1
llvm-project/llvm/lib/Target/X86/X86MCInstLower.cpp:1588:48: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare] if (C && C->getType()->getScalarSizeInBits() == SrcEltBits) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~ 1 error generated.
2024-02-02[X86] X86FixupVectorConstants - load+sign-extend vector constants that can ↵Simon Pilgrim1-1/+61
be stored in a truncated form (#79815) Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load. I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips. I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.
2024-01-23[MC][X86] Merge lane/element broadcast comment printers. (#79020)Simon Pilgrim1-73/+23
This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.
2024-01-22[CodeGen][X86] Fix lowering of tailcalls when `-ms-hotpatch` is used (#77245)Alexandre Ganea1-22/+11
Previously, tail jump pseudo-opcodes were skipped by the `encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`. This caused emission of a 2-byte NOP and dropping of the tail jump. With this PR, we change `PATCHABLE_OP` to not wrap the first `MachineInstr` anymore, but inserting itself before, leaving the instruction unaltered. At lowering time in `X86AsmPrinter`, we now "look ahead" for the next non-pseudo `MachineInstr` and lower+encode it, to inspect its size. If the size is below what `PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing. That way, now the first `MachineInstr` is always lowered as usual even if `"patchable-function"="prologue-short-redirect"` is used. Fixes https://github.com/llvm/llvm-project/issues/76879, https://github.com/llvm/llvm-project/issues/76958 and https://github.com/llvm/llvm-project/issues/59039
2024-01-22[X86] printConstant - add ConstantVector handlingSimon Pilgrim1-11/+17
2024-01-22[X86] printZeroUpperMove - add support for constant vectors.Simon Pilgrim1-35/+26
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
2024-01-22[X86] Update X86::getConstantFromPool to take base OperandNo instead of ↵Simon Pilgrim1-36/+8
Displacement MachineOperand This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
2024-01-22[X86] Add printElementBroadcast constant comments helper. NFC.Simon Pilgrim1-70/+65
Pull out helper instead of repeating switch cases.
2024-01-22[X86] Add printLaneBroadcast constant comments helper. NFC.Simon Pilgrim1-86/+72
Pull out helper instead of repeating switch cases.