aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/X86/X86FastISel.cpp
AgeCommit message (Collapse)AuthorFilesLines
4 days[llvm] Extract and propagate callee_type metadataPrabhu Rajasekaran1-0/+8
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds from callee_type metadata attached to indirect call instructions. Reviewers: nikic, ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87575
2025-05-20[x64][win] Add compiler support for x64 import call optimization (equivalent ↵Daniel Paoliello1-0/+6
to MSVC /d2guardretpoline) (#126631) This is the x64 equivalent of #121516 Since import call optimization was originally [added to x64 Windows to implement a more efficient retpoline mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618) the section and constant names relating to this all mention "retpoline" and we need to mark indirect calls, control-flow guard calls and jumps for jump tables in the section alongside calls to imported functions. As with the AArch64 feature, this emits a new section into the obj which is used by the MSVC linker to generate the Dynamic Value Relocation Table and the section itself does not appear in the final binary. The Windows Loader requires a specific sequence of instructions be emitted when this feature is enabled: * Indirect calls/jumps must have the function pointer to jump to in `rax`. * Calls to imported functions must use the `rex` prefix and be followed by a 5-byte nop. * Indirect calls must be followed by a 3-byte nop.
2025-04-17[X86][FastISel] Refactor bitcast selection into separate function (NFC)Antonio Frighetto1-30/+33
2025-03-15[X86] Use Register and MCRegister. NFCCraig Topper1-5/+5
2025-03-09[X86] Use Register in FastISel. NFCCraig Topper1-87/+92
Replace 'Reg == 0' with '!Reg'
2025-03-05[FastISel] Use Register. NFCCraig Topper1-16/+16
This focuses on the common interfaces and tablegen. More changes are needed to individual targets.
2025-02-19X86: Fix convertToThreeAddress losing subregister indexes (#124098)Matt Arsenault1-1/+2
This avoids dozens of regressions in a future patch. These primarily manifested as assertions where we had copies of 64-bit registers to 32-bit registers. This is testable in principle with hand written MIR, but that's a bit too much x86 for me.
2024-11-09[X86] Remove unused includes (NFC) (#115593)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-10-18[NFC][X86] Fix Werror=extra error due to enum (#112812)Jinsong Ji1-1/+1
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on. Built by GCC 11. Fix warnings: llvm-project/llvm/lib/Target/X86/X86FastISel.cpp: In member function ‘virtual bool {anonymous}::X86FastISel::fastLowerCall(llvm::FastISel::CallLoweringInfo&)’: llvm-project/llvm/lib/Target/X86/X86FastISel.cpp:3547: error: enumerated and non-enumerated type in conditional expression [-Werror=extra] 3547 | MIB.addReg(Is64Bit ? X86::RIP : 0).addImm(1).addReg(0);
2024-09-30FastISel: Fix incorrectly using getPointerTy (#110465)Matt Arsenault1-1/+3
This was using the default address space instead of the correct one. Fixes #56055
2024-08-04[CodeGen][NFC] Add wrapper method for MBBMap (#101893)Alexis Engelke1-3/+3
This is a preparation for changing the data structure of MBBMap.
2024-03-17[X86] Support APX CMOV/CFCMOV instructions (#82592)XinWang101-1/+2
This patch support ND CMOV instructions and CFCMOV instructions. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-11[X86] Add missing register qualifier to the VBLENDVPD/VBLENDVPS/VPBLENDVB ↵Simon Pilgrim1-1/+1
instruction names Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)
2024-03-09[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)Simon Pilgrim1-4/+4
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-02-06[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803)Craig Topper1-7/+6
The comment and code here seems to match getTypeForExtReturn. The history shows that at the time this code was added, similar code existed in SelectionDAGBuilder. SelectionDAGBuiler code has since been refactored into getTypeForExtReturn. This patch makes FastISel match SelectionDAGBuilder. The test changes are because X86 has customization of getTypeForExtReturn. So now we only extend returns to i8. Stumbled onto this difference by accident.
2024-01-05[X86]Support lowering for APX Promoted SHA/MOVDIR/CRC32/INVPCID/CET ↵XinWang101-4/+6
instructions (#76786) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the lowering for promoted SHA/MOVDIR/CRC32/INVPCID/CET. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-04[IR] Fix GEP offset computations for vector GEPs (#75448)Jannik Silvanus1-1/+1
Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-24[X86] Use range-based for loops (NFC)Kazu Hirata1-4/+3
2023-12-14Re-Reland [X86] Respect code models more when determining if a global ↵Arthur Eubanks1-5/+3
reference can fit in 32 bits (#75386) For non-GlobalValue references, the small and medium code models can use 32 bit constants. For GlobalValue references, use TargetMachine::isLargeGlobalObject(). Look through aliases for determining if a GlobalValue is small or large. Even the large code model can reference small objects with 32 bit constants as long as we're in no-pic mode, or if the reference is offset from the GOT. Original commit broke the build... First reland broke large PIC builds referencing small data since it was using GOTOFF as a 32-bit constant.
2023-12-14Revert "Reland [X86] Respect code models more when determining if a global ↵Arthur Eubanks1-3/+5
reference can fit in 32 bits (#75386)" This reverts commit ec92d74a0ef89b9dd46aee6ec8aca6bfd3c66a54. Breaks some compiler-rt tests, e.g. https://lab.llvm.org/buildbot/#/builders/37/builds/28834
2023-12-14Reland [X86] Respect code models more when determining if a global reference ↵Arthur Eubanks1-5/+3
can fit in 32 bits (#75386) For non-GlobalValue references, the small and medium code models can use 32 bit constants. For GlobalValue references, use TargetMachine::isLargeGlobalObject(). Look through aliases for determining if a GlobalValue is small or large. Even the large code model can reference small objects with 32 bit constants as long as we're in no-pic mode, or if the reference is offset from the GOT. Original commit broke the build...
2023-12-14Revert "[X86] Respect code models more when determining if a global ↵Arthur Eubanks1-2/+4
reference can fit in 32 bits" (#75500) Reverts llvm/llvm-project#75386 Breaks build.
2023-12-14[X86] Respect code models more when determining if a global reference can ↵Arthur Eubanks1-4/+2
fit in 32 bits (#75386) For non-GlobalValue references, the small and medium code models can use 32 bit constants. For GlobalValue references, use TargetMachine::isLargeGlobalObject(). Look through aliases for determining if a GlobalValue is small or large. Even the large code model can reference small objects with 32 bit constants as long as we're in no-pic mode, or if the reference is offset from the GOT.
2023-12-13[X86][FastISel] Support medium code model in more places (#75375)Arthur Eubanks1-6/+12
The medium code model is basically identical to the small code model except that large objects cannot be referenced with 32-bit offsets.
2023-12-12[X86][FastISel] Bail out on large objects when materializing a GlobalValueArthur Eubanks1-0/+6
To avoid crashes with explicitly large objects. I will clean up fast-isel with large objects/medium code model soon.
2023-11-09Obsolete WebKit Calling Convention (#71567)Juergen Ributzka1-2/+1
The WebKit Calling Convention was created specifically for the WebKit FTL. FTL doesn't use LLVM anymore and therefore this calling convention is obsolete. This commit removes the WebKit CC, its associated tests, and documentation.
2023-10-03[X86][FastISel] X86SelectIntToFP - don't assume value type is simple.Simon Pilgrim1-1/+1
Fixes #68068
2023-09-28[X86FastISel] Avoid ConstantExpr::getZExt() (NFC)Nikita Popov1-2/+2
Work on APInt instead.
2023-09-10[X86] FastISel -fno-pic: emit R_386_PC32 when calling an intrinsicFangrui Song1-0/+4
This matches how a SelectionDAG::getExternalSymbol node is lowered. On x86-32, a function call in -fno-pic code should emit R_386_PC32 (since ebx is not set up). When linked as -shared (problematic!), the generated text relocation will work. Ideally, we should mark IR intrinsics created in CodeGenFunction::EmitBuiltinExpr as dso_local, but the code structure makes it not very feasible. Fix #51078
2023-05-19Revert "Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and ↵Shengchen Kan1-10/+1
optimize during MC lowering, NFCI"" This reverts commit cb16b33a03aff70b2499c3452f2f817f3f92d20d. In fact, the test https://bugs.chromium.org/p/chromium/issues/detail?id=1446973#c2 already passed after 5586bc539acb26cb94e461438de01a5080513401
2023-05-19Revert "[X86] Remove patterns for ADC/SBB with immediate 8 and optimize ↵Hans Wennborg1-1/+10
during MC lowering, NFCI" This caused compiler assertions, see comment on https://reviews.llvm.org/D150107. This also reverts the dependent follow-up change: > [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI > > This is follow-up of D150107. > > In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be > shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. > > Differential Revision: https://reviews.llvm.org/D150949 This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and 5586bc539acb26cb94e461438de01a5080513401.
2023-05-19[X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and ↵Shengchen Kan1-10/+1
optimize during MC lowering, NFCI This is follow-up of D150107. In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp. Differential Revision: https://reviews.llvm.org/D150949
2023-04-21[X86][FastISel] Handle CRC32 intrinsicsAlexis Engelke1-0/+52
Some applications make heavy use of the crc32 operation (e.g., as part of a hash function), so having a FastISel path avoids fallbacks to SelectionDAG and improves compile times, in our case by ~1.5%. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D148023
2023-01-23[MC] Define and use MCInstrDesc implicit_uses and implicit_defs. NFC.Jay Foad1-1/+1
The new methods return a range for easier iteration. Use them everywhere instead of getImplicitUses, getNumImplicitUses, getImplicitDefs and getNumImplicitDefs. A future patch will remove the old methods. In some use cases the new methods are less efficient because they always have to scan the whole uses/defs array to count its length, but that will be fixed in a future patch by storing the number of implicit uses/defs explicitly in MCInstrDesc. At that point there will be no need to 0-terminate the arrays. Differential Revision: https://reviews.llvm.org/D142215
2023-01-04[MC] Consistently use MCInstrDesc::getImplicitUses and getImplicitDefs. NFC.Jay Foad1-2/+3
2022-09-12[llvm] Use x.empty() instead of llvm::empty(x) (NFC)Kazu Hirata1-3/+1
I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677
2022-09-07[FastISel] Propagate PCSections metadata to MachineInstrMarco Elver1-102/+102
Propagate PC sections metadata to MachineInstr when FastISel is doing instruction selection. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130884
2022-08-24KCFI sanitizerSami Tolvanen1-0/+4
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands 67504c95494ff05be2a613129110c9bcf17f6c13 with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296
2022-08-24Revert "KCFI sanitizer"Sami Tolvanen1-4/+0
This reverts commit 67504c95494ff05be2a613129110c9bcf17f6c13 as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.
2022-08-24KCFI sanitizerSami Tolvanen1-0/+4
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296
2022-08-08[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFCFangrui Song1-9/+9
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-06-17Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on ↵Phoebe Wang1-7/+10
X86 following the psABI"""" This resolves problems reported in commit 1a20252978c76cf2518aa45b175a9e5d6d36c4f0. 1. Promote to float lowering for nodes XINT_TO_FP 2. Bail out f16 from shuffle combine due to vector type is not legal in the version
2022-06-17Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on ↵Benjamin Kramer1-10/+7
X86 following the psABI"""" This reverts commit 04a3d5f3a1193fb87576425a385aa0a6115b1e7c. I see two more issues: - uitofp/sitofp from i32/i64 to half now generates __floatsihf/__floatdihf, which exists in neither compiler-rt nor libgcc - This crashes when legalizing the bitcast: ``` ; RUN: llc < %s -mcpu=skx define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr { entry: %fusion = load ptr, ptr %buffer_table, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_1.2 = load ptr, ptr %0, align 8 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2 %Arg_0.1 = load ptr, ptr %1, align 8 %2 = load half, ptr %Arg_0.1, align 8 %3 = bitcast half %2 to i16 %4 = and i16 %3, 32767 %5 = icmp eq i16 %4, 0 %6 = and i16 %3, -32768 %broadcast.splatinsert = insertelement <4 x half> poison, half %2, i64 0 %broadcast.splat = shufflevector <4 x half> %broadcast.splatinsert, <4 x half> poison, <4 x i32> zeroinitializer %broadcast.splatinsert9 = insertelement <4 x i16> poison, i16 %4, i64 0 %broadcast.splat10 = shufflevector <4 x i16> %broadcast.splatinsert9, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert11 = insertelement <4 x i16> poison, i16 %6, i64 0 %broadcast.splat12 = shufflevector <4 x i16> %broadcast.splatinsert11, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert13 = insertelement <4 x i16> poison, i16 %3, i64 0 %broadcast.splat14 = shufflevector <4 x i16> %broadcast.splatinsert13, <4 x i16> poison, <4 x i32> zeroinitializer %wide.load = load <4 x half>, ptr %Arg_1.2, align 8 %7 = fcmp uno <4 x half> %broadcast.splat, %wide.load %8 = fcmp oeq <4 x half> %broadcast.splat, %wide.load %9 = bitcast <4 x half> %wide.load to <4 x i16> %10 = and <4 x i16> %9, <i16 32767, i16 32767, i16 32767, i16 32767> %11 = icmp eq <4 x i16> %10, zeroinitializer %12 = and <4 x i16> %9, <i16 -32768, i16 -32768, i16 -32768, i16 -32768> %13 = or <4 x i16> %12, <i16 1, i16 1, i16 1, i16 1> %14 = select <4 x i1> %11, <4 x i16> %9, <4 x i16> %13 %15 = icmp ugt <4 x i16> %broadcast.splat10, %10 %16 = icmp ne <4 x i16> %broadcast.splat12, %12 %17 = or <4 x i1> %15, %16 %18 = select <4 x i1> %17, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1> %19 = add <4 x i16> %18, %broadcast.splat14 %20 = select i1 %5, <4 x i16> %14, <4 x i16> %19 %21 = select <4 x i1> %8, <4 x i16> %9, <4 x i16> %20 %22 = bitcast <4 x i16> %21 to <4 x half> %23 = select <4 x i1> %7, <4 x half> <half 0xH7E00, half 0xH7E00, half 0xH7E00, half 0xH7E00>, <4 x half> %22 store <4 x half> %23, ptr %fusion, align 16 ret void } ``` llc: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:977: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode *): Assertion `(TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal || Op.getOpcode() == ISD::TargetConstant || Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.
2022-06-17Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 ↵Phoebe Wang1-7/+10
following the psABI""" Fix the crash on lowering X86ISD::FCMP.
2022-06-15Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 ↵Frederik Gossen1-10/+7
following the psABI""" This reverts commit e1c5afa47d37012499467b5061fc42e50884d129. This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is below. Let me know if you have trouble reproducing this. ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\00\00\00?" @1 = private unnamed_addr constant [4 x i8] c"\1C}\908" @2 = private unnamed_addr constant [4 x i8] c"?\00\\4" @3 = private unnamed_addr constant [4 x i8] c"%ci1" @4 = private unnamed_addr constant [4 x i8] zeroinitializer @5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0" @6 = private unnamed_addr constant [4 x i8] c"\00\00\00B" @7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22" @8 = private unnamed_addr constant [4 x i8] c"^\09B6" @9 = private unnamed_addr constant [4 x i8] c"\15\F3M?" @10 = private unnamed_addr constant [4 x i8] c"e\CC\\;" @11 = private unnamed_addr constant [4 x i8] c"d\BD/>" @12 = private unnamed_addr constant [4 x i8] c"V\F4I=" @13 = private unnamed_addr constant [4 x i8] c"\10\CB,<" @14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:" @15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9" @16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897" @17 = private unnamed_addr constant [4 x i8] c"%\F9\955" @18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813" @19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2" @20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4" @21 = private unnamed_addr constant [4 x i8] c"~3\94\B6" @22 = private unnamed_addr constant [4 x i8] c"3Yq\B8" @23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA" @24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB" @25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC" @26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD" @27 = private unnamed_addr constant [4 x i8] c"uB-?" @28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE" @29 = private unnamed_addr constant [4 x i8] c"\00\00\00A" ; Function Attrs: uwtable define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.1 = alloca i64, align 8 %fusion.invar_address.dim.0 = alloca i64, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0 %fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 store i64 0, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 return: ; preds = %fusion.loop_exit.dim.0 ret void fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %entry %fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8 %2 = icmp uge i64 %fusion.indvar.dim.0, 3 br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8 %3 = icmp uge i64 %fusion.indvar.dim.1, 1 br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 %4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0 %5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3 %6 = fpext half %5 to float %7 = call float @llvm.fabs.f32(float %6) %constant.121 = load float, ptr @29, align 4 %compare.2 = fcmp ole float %7, %constant.121 %8 = zext i1 %compare.2 to i8 %constant.120 = load float, ptr @0, align 4 %multiply.95 = fmul float %7, %constant.120 %constant.119 = load float, ptr @5, align 4 %add.82 = fadd float %multiply.95, %constant.119 %constant.118 = load float, ptr @4, align 4 %multiply.94 = fmul float %add.82, %constant.118 %constant.117 = load float, ptr @19, align 4 %add.81 = fadd float %multiply.94, %constant.117 %multiply.92 = fmul float %add.82, %add.81 %constant.116 = load float, ptr @18, align 4 %add.79 = fadd float %multiply.92, %constant.116 %multiply.91 = fmul float %add.82, %add.79 %subtract.87 = fsub float %multiply.91, %add.81 %constant.115 = load float, ptr @20, align 4 %add.78 = fadd float %subtract.87, %constant.115 %multiply.89 = fmul float %add.82, %add.78 %subtract.86 = fsub float %multiply.89, %add.79 %constant.114 = load float, ptr @17, align 4 %add.76 = fadd float %subtract.86, %constant.114 %multiply.88 = fmul float %add.82, %add.76 %subtract.84 = fsub float %multiply.88, %add.78 %constant.113 = load float, ptr @21, align 4 %add.75 = fadd float %subtract.84, %constant.113 %multiply.86 = fmul float %add.82, %add.75 %subtract.83 = fsub float %multiply.86, %add.76 %constant.112 = load float, ptr @16, align 4 %add.73 = fadd float %subtract.83, %constant.112 %multiply.85 = fmul float %add.82, %add.73 %subtract.81 = fsub float %multiply.85, %add.75 %constant.111 = load float, ptr @22, align 4 %add.72 = fadd float %subtract.81, %constant.111 %multiply.83 = fmul float %add.82, %add.72 %subtract.80 = fsub float %multiply.83, %add.73 %constant.110 = load float, ptr @15, align 4 %add.70 = fadd float %subtract.80, %constant.110 %multiply.82 = fmul float %add.82, %add.70 %subtract.78 = fsub float %multiply.82, %add.72 %constant.109 = load float, ptr @23, align 4 %add.69 = fadd float %subtract.78, %constant.109 %multiply.80 = fmul float %add.82, %add.69 %subtract.77 = fsub float %multiply.80, %add.70 %constant.108 = load float, ptr @14, align 4 %add.68 = fadd float %subtract.77, %constant.108 %multiply.79 = fmul float %add.82, %add.68 %subtract.75 = fsub float %multiply.79, %add.69 %constant.107 = load float, ptr @24, align 4 %add.67 = fadd float %subtract.75, %constant.107 %multiply.77 = fmul float %add.82, %add.67 %subtract.74 = fsub float %multiply.77, %add.68 %constant.106 = load float, ptr @13, align 4 %add.66 = fadd float %subtract.74, %constant.106 %multiply.76 = fmul float %add.82, %add.66 %subtract.72 = fsub float %multiply.76, %add.67 %constant.105 = load float, ptr @25, align 4 %add.65 = fadd float %subtract.72, %constant.105 %multiply.74 = fmul float %add.82, %add.65 %subtract.71 = fsub float %multiply.74, %add.66 %constant.104 = load float, ptr @12, align 4 %add.64 = fadd float %subtract.71, %constant.104 %multiply.73 = fmul float %add.82, %add.64 %subtract.69 = fsub float %multiply.73, %add.65 %constant.103 = load float, ptr @26, align 4 %add.63 = fadd float %subtract.69, %constant.103 %multiply.71 = fmul float %add.82, %add.63 %subtract.67 = fsub float %multiply.71, %add.64 %constant.102 = load float, ptr @11, align 4 %add.62 = fadd float %subtract.67, %constant.102 %multiply.70 = fmul float %add.82, %add.62 %subtract.66 = fsub float %multiply.70, %add.63 %constant.101 = load float, ptr @28, align 4 %add.61 = fadd float %subtract.66, %constant.101 %multiply.68 = fmul float %add.82, %add.61 %subtract.65 = fsub float %multiply.68, %add.62 %constant.100 = load float, ptr @27, align 4 %add.60 = fadd float %subtract.65, %constant.100 %subtract.64 = fsub float %add.60, %add.62 %multiply.66 = fmul float %subtract.64, %constant.120 %constant.99 = load float, ptr @6, align 4 %divide.4 = fdiv float %constant.99, %7 %add.59 = fadd float %divide.4, %constant.119 %multiply.65 = fmul float %add.59, %constant.118 %constant.98 = load float, ptr @3, align 4 %add.58 = fadd float %multiply.65, %constant.98 %multiply.64 = fmul float %add.59, %add.58 %constant.97 = load float, ptr @7, align 4 %add.57 = fadd float %multiply.64, %constant.97 %multiply.63 = fmul float %add.59, %add.57 %subtract.63 = fsub float %multiply.63, %add.58 %constant.96 = load float, ptr @2, align 4 %add.56 = fadd float %subtract.63, %constant.96 %multiply.62 = fmul float %add.59, %add.56 %subtract.62 = fsub float %multiply.62, %add.57 %constant.95 = load float, ptr @8, align 4 %add.55 = fadd float %subtract.62, %constant.95 %multiply.61 = fmul float %add.59, %add.55 %subtract.61 = fsub float %multiply.61, %add.56 %constant.94 = load float, ptr @1, align 4 %add.54 = fadd float %subtract.61, %constant.94 %multiply.60 = fmul float %add.59, %add.54 %subtract.60 = fsub float %multiply.60, %add.55 %constant.93 = load float, ptr @10, align 4 %add.53 = fadd float %subtract.60, %constant.93 %multiply.59 = fmul float %add.59, %add.53 %subtract.59 = fsub float %multiply.59, %add.54 %constant.92 = load float, ptr @9, align 4 %add.52 = fadd float %subtract.59, %constant.92 %subtract.58 = fsub float %add.52, %add.54 %multiply.58 = fmul float %subtract.58, %constant.120 %9 = call float @llvm.sqrt.f32(float %7) %10 = fdiv float 1.000000e+00, %9 %multiply.57 = fmul float %multiply.58, %10 %11 = trunc i8 %8 to i1 %12 = select i1 %11, float %multiply.66, float %multiply.57 %13 = fptrunc float %12 to half %14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0 store half %13, ptr %14, align 2, !alias.scope !3 %invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 br label %return } ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.fabs.f32(float %0) #1 ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.sqrt.f32(float %0) #1 attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" } attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 6} !2 = !{i64 8} !3 = !{!4} !4 = !{!"buffer: {index:0, offset:0, size:6}", !5} !5 = !{!"XLA global AA domain"}
2022-06-15Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following ↵Phoebe Wang1-7/+10
the psABI"" Fixed the missing SQRT promotion. Adding several missing operations too.
2022-06-15Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following ↵Thomas Joerg1-10/+7
the psABI"" This reverts commit 6e02e27536b9de25a651cfc9c2966ce471169355. This introduces a crash in the backend. Reproducer in MLIR's LLVM dialect follows. Let me know if you have trouble reproducing this. module { llvm.func @malloc(i64) -> !llvm.ptr<i8> llvm.func @_mlir_ciface_tf_report_error(!llvm.ptr<i8>, i32, !llvm.ptr<i8>) llvm.mlir.global internal constant @error_message_2208944672953921889("failed to allocate memory at loc(\22-\22:3:8)\00") llvm.func @_mlir_ciface_tf_alloc(!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> llvm.func @Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<i8>, %arg1: i64, %arg2: !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.mlir.constant(8 : i32) : i32 %1 = llvm.mlir.constant(8 : index) : i64 %2 = llvm.mlir.constant(2 : index) : i64 %3 = llvm.mlir.constant(dense<0.000000e+00> : vector<4xf16>) : vector<4xf16> %4 = llvm.mlir.constant(dense<[0, 1, 2, 3]> : vector<4xi32>) : vector<4xi32> %5 = llvm.mlir.constant(dense<1.000000e+00> : vector<4xf16>) : vector<4xf16> %6 = llvm.mlir.constant(false) : i1 %7 = llvm.mlir.constant(1 : i32) : i32 %8 = llvm.mlir.constant(0 : i32) : i32 %9 = llvm.mlir.constant(4 : index) : i64 %10 = llvm.mlir.constant(0 : index) : i64 %11 = llvm.mlir.constant(1 : index) : i64 %12 = llvm.mlir.constant(-1 : index) : i64 %13 = llvm.mlir.null : !llvm.ptr<f16> %14 = llvm.getelementptr %13[%9] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %15 = llvm.ptrtoint %14 : !llvm.ptr<f16> to i64 %16 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %17 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %18 = llvm.mlir.null : !llvm.ptr<i64> %19 = llvm.getelementptr %18[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %20 = llvm.ptrtoint %19 : !llvm.ptr<i64> to i64 %21 = llvm.alloca %20 x i64 : (i64) -> !llvm.ptr<i64> llvm.br ^bb1(%10 : i64) ^bb1(%22: i64): // 2 preds: ^bb0, ^bb2 %23 = llvm.icmp "slt" %22, %arg1 : i64 llvm.cond_br %23, ^bb2, ^bb3 ^bb2: // pred: ^bb1 %24 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %25 = llvm.getelementptr %24[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %26 = llvm.add %22, %11 : i64 %27 = llvm.getelementptr %25[%26] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %28 = llvm.load %27 : !llvm.ptr<i64> %29 = llvm.getelementptr %21[%22] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %28, %29 : !llvm.ptr<i64> llvm.br ^bb1(%26 : i64) ^bb3: // pred: ^bb1 llvm.br ^bb4(%10, %11 : i64, i64) ^bb4(%30: i64, %31: i64): // 2 preds: ^bb3, ^bb5 %32 = llvm.icmp "slt" %30, %arg1 : i64 llvm.cond_br %32, ^bb5, ^bb6 ^bb5: // pred: ^bb4 %33 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %34 = llvm.getelementptr %33[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %35 = llvm.add %30, %11 : i64 %36 = llvm.getelementptr %34[%35] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %37 = llvm.load %36 : !llvm.ptr<i64> %38 = llvm.mul %37, %31 : i64 llvm.br ^bb4(%35, %38 : i64, i64) ^bb6: // pred: ^bb4 %39 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> %40 = llvm.getelementptr %39[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %41 = llvm.load %40 : !llvm.ptr<ptr<f16>> %42 = llvm.getelementptr %13[%11] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %43 = llvm.ptrtoint %42 : !llvm.ptr<f16> to i64 %44 = llvm.alloca %7 x i32 : (i32) -> !llvm.ptr<i32> llvm.store %8, %44 : !llvm.ptr<i32> %45 = llvm.call @_mlir_ciface_tf_alloc(%arg0, %31, %43, %8, %7, %44) : (!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> %46 = llvm.bitcast %45 : !llvm.ptr<i8> to !llvm.ptr<f16> %47 = llvm.icmp "eq" %31, %10 : i64 %48 = llvm.or %6, %47 : i1 %49 = llvm.mlir.null : !llvm.ptr<i8> %50 = llvm.icmp "ne" %45, %49 : !llvm.ptr<i8> %51 = llvm.or %50, %48 : i1 llvm.cond_br %51, ^bb7, ^bb13 ^bb7: // pred: ^bb6 %52 = llvm.urem %31, %9 : i64 %53 = llvm.sub %31, %52 : i64 llvm.br ^bb8(%10 : i64) ^bb8(%54: i64): // 2 preds: ^bb7, ^bb9 %55 = llvm.icmp "slt" %54, %53 : i64 llvm.cond_br %55, ^bb9, ^bb10 ^bb9: // pred: ^bb8 %56 = llvm.mul %54, %11 : i64 %57 = llvm.add %56, %10 : i64 %58 = llvm.add %57, %10 : i64 %59 = llvm.getelementptr %41[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %60 = llvm.bitcast %59 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %61 = llvm.load %60 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %62 = "llvm.intr.sqrt"(%61) : (vector<4xf16>) -> vector<4xf16> %63 = llvm.fdiv %5, %62 : vector<4xf16> %64 = llvm.getelementptr %46[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %65 = llvm.bitcast %64 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %63, %65 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %66 = llvm.add %54, %9 : i64 llvm.br ^bb8(%66 : i64) ^bb10: // pred: ^bb8 %67 = llvm.icmp "ult" %53, %31 : i64 llvm.cond_br %67, ^bb11, ^bb12 ^bb11: // pred: ^bb10 %68 = llvm.mul %53, %12 : i64 %69 = llvm.add %31, %68 : i64 %70 = llvm.mul %53, %11 : i64 %71 = llvm.add %70, %10 : i64 %72 = llvm.trunc %69 : i64 to i32 %73 = llvm.mlir.undef : vector<4xi32> %74 = llvm.insertelement %72, %73[%8 : i32] : vector<4xi32> %75 = llvm.shufflevector %74, %73 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xi32>, vector<4xi32> %76 = llvm.icmp "slt" %4, %75 : vector<4xi32> %77 = llvm.add %71, %10 : i64 %78 = llvm.getelementptr %41[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %79 = llvm.bitcast %78 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %80 = llvm.intr.masked.load %79, %76, %3 {alignment = 2 : i32} : (!llvm.ptr<vector<4xf16>>, vector<4xi1>, vector<4xf16>) -> vector<4xf16> %81 = llvm.bitcast %16 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %80, %81 : !llvm.ptr<vector<4xf16>> %82 = llvm.load %81 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %83 = "llvm.intr.sqrt"(%82) : (vector<4xf16>) -> vector<4xf16> %84 = llvm.fdiv %5, %83 : vector<4xf16> %85 = llvm.bitcast %17 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %84, %85 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %86 = llvm.load %85 : !llvm.ptr<vector<4xf16>> %87 = llvm.getelementptr %46[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %88 = llvm.bitcast %87 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.intr.masked.store %86, %88, %76 {alignment = 2 : i32} : vector<4xf16>, vector<4xi1> into !llvm.ptr<vector<4xf16>> llvm.br ^bb12 ^bb12: // 2 preds: ^bb10, ^bb11 %89 = llvm.mul %2, %1 : i64 %90 = llvm.mul %arg1, %2 : i64 %91 = llvm.add %90, %11 : i64 %92 = llvm.mul %91, %1 : i64 %93 = llvm.add %89, %92 : i64 %94 = llvm.alloca %93 x i8 : (i64) -> !llvm.ptr<i8> %95 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %46, %95 : !llvm.ptr<ptr<f16>> %96 = llvm.getelementptr %95[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %46, %96 : !llvm.ptr<ptr<f16>> %97 = llvm.getelementptr %95[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %98 = llvm.bitcast %97 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %98 : !llvm.ptr<i64> %99 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>> %100 = llvm.getelementptr %99[%10, 3] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>>, i64) -> !llvm.ptr<i64> %101 = llvm.getelementptr %100[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %102 = llvm.sub %arg1, %11 : i64 llvm.br ^bb14(%102, %11 : i64, i64) ^bb13: // pred: ^bb6 %103 = llvm.mlir.addressof @error_message_2208944672953921889 : !llvm.ptr<array<42 x i8>> %104 = llvm.getelementptr %103[%10, %10] : (!llvm.ptr<array<42 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @_mlir_ciface_tf_report_error(%arg0, %0, %104) : (!llvm.ptr<i8>, i32, !llvm.ptr<i8>) -> () %105 = llvm.mul %2, %1 : i64 %106 = llvm.mul %2, %10 : i64 %107 = llvm.add %106, %11 : i64 %108 = llvm.mul %107, %1 : i64 %109 = llvm.add %105, %108 : i64 %110 = llvm.alloca %109 x i8 : (i64) -> !llvm.ptr<i8> %111 = llvm.bitcast %110 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %13, %111 : !llvm.ptr<ptr<f16>> %112 = llvm.getelementptr %111[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %13, %112 : !llvm.ptr<ptr<f16>> %113 = llvm.getelementptr %111[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %114 = llvm.bitcast %113 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %114 : !llvm.ptr<i64> %115 = llvm.call @malloc(%109) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%115, %110, %109, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %116 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %117 = llvm.insertvalue %10, %116[0] : !llvm.struct<(i64, ptr<i8>)> %118 = llvm.insertvalue %115, %117[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %118 : !llvm.struct<(i64, ptr<i8>)> ^bb14(%119: i64, %120: i64): // 2 preds: ^bb12, ^bb15 %121 = llvm.icmp "sge" %119, %10 : i64 llvm.cond_br %121, ^bb15, ^bb16 ^bb15: // pred: ^bb14 %122 = llvm.getelementptr %21[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %123 = llvm.load %122 : !llvm.ptr<i64> %124 = llvm.getelementptr %100[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %123, %124 : !llvm.ptr<i64> %125 = llvm.getelementptr %101[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %120, %125 : !llvm.ptr<i64> %126 = llvm.mul %120, %123 : i64 %127 = llvm.sub %119, %11 : i64 llvm.br ^bb14(%127, %126 : i64, i64) ^bb16: // pred: ^bb14 %128 = llvm.call @malloc(%93) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%128, %94, %93, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %129 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %130 = llvm.insertvalue %arg1, %129[0] : !llvm.struct<(i64, ptr<i8>)> %131 = llvm.insertvalue %128, %130[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %131 : !llvm.struct<(i64, ptr<i8>)> } llvm.func @_mlir_ciface_Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<struct<(i64, ptr<i8>)>>, %arg1: !llvm.ptr<i8>, %arg2: !llvm.ptr<struct<(i64, ptr<i8>)>>) attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.load %arg2 : !llvm.ptr<struct<(i64, ptr<i8>)>> %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> %3 = llvm.call @Rsqrt_CPU_DT_HALF_DT_HALF(%arg1, %1, %2) : (!llvm.ptr<i8>, i64, !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> llvm.store %3, %arg0 : !llvm.ptr<struct<(i64, ptr<i8>)>> llvm.return } }
2022-06-15Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"Phoebe Wang1-7/+10
Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see the issue here https://github.com/llvm/llvm-project/issues/55992
2022-06-12Revert "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"Mehdi Amini1-10/+7
This reverts commit 2d2da259c8726fd5c974c01122a9689981a12196. This breaks MLIR integration test (JIT crashing), reverting in the meantime.
2022-06-12[X86][RFC] Enable `_Float16` type support on X86 following the psABIPhoebe Wang1-7/+10
GCC and Clang/LLVM will support `_Float16` on X86 in C/C++, following the latest X86 psABI. (https://gitlab.com/x86-psABIs) _Float16 arithmetic will be performed using native half-precision. If native arithmetic instructions are not available, it will be performed at a higher precision (currently always float) and then truncated down to _Float16 immediately after each single arithmetic operation. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107082