aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target
AgeCommit message (Collapse)AuthorFilesLines
2 hours[X86][MemFold] Allow masked load folding if masks are equal (#161074)Phoebe Wang2-1/+35
Inspired by #160920#issuecomment-3341816198
6 hours[LoongArch] Add option for merge base offset passwanglei1-1/+6
Add `loongarch-enable-merge-offset` option to allow disabling the `MergeBaseOffset` pass when using optimization. Reviewers: SixWeining, heiher Reviewed By: SixWeining, heiher Pull Request: https://github.com/llvm/llvm-project/pull/161063
7 hours[SPIRV] Frexp intrinsic implementation (#157436)Ebin-McW2-1/+55
- Make use of the OpenCL extended instruction frexp. - Creates a variable and passes it to OpExtInst instruction
17 hours[CodeGen] Get rid of incorrect `std` template specializations (#160804)A. Jiang2-5/+5
This patch renames comparators - from `std::equal_to<llvm::rdf::RegisterRef>` to `llvm::rdf::RegisterRefEqualTo`, and - from `std::less<llvm::rdf::RegisterRef>` to `llvm::rdf::RegisterRefLess`. The original specializations don't satisfy the requirements for the original `std` templates by being stateful and non-default-constructible, so they make the program have UB due to C++17 [namespace.std]/2, C++20/23 [namespace.std]/5. > A program may explicitly instantiate a class template defined in the standard library only if the declaration > - depends on the name of at least one program-defined type, and > - the instantiation meets the standard library requirements for the original template.
18 hours[SPIRV] Added opencl Pipe builtins (#135335)Ebin-McW3-1/+107
- Added opencl Pipe builtins - Pipe instructions were added in tablegen and lowered in SPIRVBuiltins.cpp --------- Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com> Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
18 hours[SPIRV] Added constraint for SPV_INTEL_bindless_image extension (#160249)Ebin-McW1-3/+34
Added constraints related to Addressing model as specified in the specification. It conforms with the implementation in translator Same as PR #160089 Solved all issues
18 hours[SPIRV] Add support for the extension ↵Ebin-McW2-1/+35
SPV_EXT_relaxed_printf_string_address_space (#160245) Added support for the extension to support more storageclass for printf strings.
18 hours[SPIRV] Added lowering for the debugtrap intrinsic (#157442)Subash B1-1/+13
Mapped llvm.debugtrap intrinsic to OpNop in the SPIR-V backend, since SPIR-V has no direct equivalent with tests.
18 hours[SPIRV] Added support for the constrained comparison intrinsics (#157439)Subash B1-0/+24
Added SPIR-V support for constrained floating-point comparison intrinsics (fcmp, fcmps) with lowering and tests.
20 hours[SPIRV] Fix type mismatch assertion in insertvalue. (#143131)Tim Besard1-5/+6
The code was incorrectly converting all `undef` arguments to `i32`, while the `spv_insertv` intrinsics only expects that for the first operand, representing the aggregate type. Fixes https://github.com/llvm/llvm-project/issues/127977 --------- Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com>
22 hours[LoongArch] Add patterns to support `[x]vadda.{b/h/w/d}` generation (#160674)ZhaoQi3-0/+34
This commit add patterns for lsx and lasx to support generating `[x]vadda.{b/h/w/d}` instructions. Note: For convenience, this commit also set `ISD::ABS` as legal. As shown in the tests, this brings no change to the results, just same as the results obtained from expanding it before. But, setting it as legal brings more vectorization opportunities to IR transformation which may bring more vector optimization chances for later stages and the backend.
25 hours[ARM] Remove `UnsafeFPMath` uses (#151275)paperchalice1-2/+17
Try to remove `UnsafeFPMath` uses in arm backend. These global flags block some improvements like https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797. Remove them incrementally.
28 hours[MIPS][float] Fixed SingleFloat codegen on N32/N64 targets (#140575)Davide Mor4-23/+72
This patch aims at making the combination of single-float and N32/N64 ABI properly work. Right now when both options are enabled the compiler chooses an incorrect ABI and in some cases even generates wrong instructions. The floating point behavior on MIPS is controlled through 3 flags: soft-float, single-float, fp64. This makes things complicated because fp64 indicates the presence of 64bit floating point registers, but cannot be easily disabled (the mips3 feature require it, but mips3 CPUs with only 32bit floating point exist). Also if fp64 is missing it doesn't actually disable 64bit floating point operations, because certain MIPS1/2 CPUs support 64bit floating point with 32bit registers, hence the single-float option. I'm guessing that originally single-float was only intended for the latter case, and that's the reason why it doesn't properly work on 64bit targets. So this patch does the following: - Make single-float a "master disable", even if fp64 is enabled this should completely disable generation of 64bit floating point operations, making it available on targets which hard require fp64. - Add proper calling conventions for N32/N64 single-float combinations. - Fixup codegen to not generate certain 64bit floating point operations, apparently not assigning a register class to f64 values is not enough to prevent them from showing up. - Add tests for the new calling conventions and codegen.
38 hours[X86] matchVPMADD52 - only use 512-bit MADD52 on AVX512IFMA targets (#161011)Simon Pilgrim1-3/+5
If we have a AVX512 target capable of AVXIFMA but not AVX512IFMA then we must split 512-bit (or larger) types to 256-bits Fixes #160928
39 hours[ARM] Generate build-attributes more correctly in the presence of intrinsic ↵David Green1-8/+11
declarations. (#160749) This code doesn't work very well, but this makes it work when intrinsic definitions are present. It now discounts functions declarations from the set of attributes it looks at. The code would have worked better before 0ab5b5b8581d9f2951575f7245824e6e4fc57dec when module-level attributes could provide the information used to construct build-attributes.
2 daysRevert "[TTI][RISCV] Add cost modelling for intrinsic vp.load.ff (#160470)"ShihPo Hung4-35/+0
This reverts commit aa08b1a9963f33ded658d3ee655429e1121b5212.
2 daysAMDGPU: Check if immediate is legal for av_mov_b32_imm_pseudo (#160819)Matt Arsenault1-0/+9
This is primarily to avoid folding a frame index materialized into an SGPR into the pseudo; this would end up looking like: %sreg = s_mov_b32 %stack.0 %av_32 = av_mov_b32_imm_pseudo %sreg Which is not useful. Match the check used for the b64 case. This is limited to the pseudo to avoid regression due to gfx908's special case - it is expecting to pass here with v_accvgpr_write_b32 for illegal cases, and stay in the intermediate state with an sgpr input. This avoids regressions in a future patch.
2 days[NVPTX] legalize v2i32 to improve compatibility with v2f32 (#153478)Princeton Ferro5-82/+124
Since v2f32 is legal but v2i32 is not, this causes some sequences of operations like bitcast (build_vector) to be lowered inefficiently.
2 days[ASan][RISCV] Teach AddressSanitizer to support indexed load/store. (#160443)Hank Chang1-0/+38
This patch is based on https://github.com/llvm/llvm-project/pull/159713 This patch extends AddressSanitizer to support indexed/segment instructions in RVV. It enables proper instrumentation for these memory operations. A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to describe the offset between the base pointer and the actual memory reference address. Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
3 days[AMDGPU][True16][CodeGen] Avoid setting hi part in copysign (#160891)Piotr Sobczak1-2/+3
This is a temporary fix for a regression from #154875. The new pattern sets the hi part of V_BFI result and that confuses si-fix-sgpr-copies - where the proper fix is likely to be. During si-fix-sgpr-copies, an incorrect fold happens: %86:vgpr_32 = V_BFI_B32_e64 %87:sreg_32 = COPY %86.hi16:vgpr_32 %95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, killed %87:sreg_32, 0, %63:vgpr_16, 0, 0 into %86:vgpr_32 = V_BFI_B32_e64 %95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, %86.lo16:vgpr_32, 0, %63:vgpr_16, 0, 0 Fixes: Vulkan CTS dEQP-VK.glsl.builtin.precision_fp16_storage32b.*.
3 days[LLVM][M68k] Fix build failure caused by #160797 (#160926)Rahul Joshi1-7/+4
Fix M68k build failures caused by https://github.com/llvm/llvm-project/pull/160797
3 days[RISCV] Update SiFive7's scheduling models with their optimizations on ↵Min-Yih Hsu1-10/+94
permutation instructions (#160763) In newer SiFIve7 cores like X390, permutation instructions like vrgather.vv operates on LMUL smaller than a single DLEN could yield a constant cycle. For slightly larger data that fits in the constraint of `log2(SEW/8) + log2(LMUL) <= log2(DLEN / 32)`, these instructions can also yield cycles that are proportional to the quadratic of LMUL, rather than being proportional to VL. Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
3 days[AMDGPU] Ensure divergence for v_alignbit (#129159)Jeffrey Byrnes1-7/+7
Selecting vgpr for the uniform version of this pattern may lead to unnecessary vgpr and waterfall loops.
3 days[DirectX] Validating Root flags are denying shader stage (#160919)joaosaffran1-12/+60
Root Signature Flags, allow flags to block compilation of certain shader stages. This PR implements a validation and notify the user if they compile a root signature that is denying such shader stage. Closes: https://github.com/llvm/llvm-project/issues/153062 Previously approved: https://github.com/llvm/llvm-project/pull/153287 --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com> Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com> Co-authored-by: Joao Saffran <jderezende@microsoft.com>
3 days[NFC][PowerPC] Consolidate predicate definitions into PPC.td (#160579)Lei Huang7-71/+66
Consolidate predicate definitions into top level entry point for PowerPC target `PPC.td` and remove duplicate definitions for 32/64 bit sub-target checks.
3 days[NFC][LLVM] Pass/return SMLoc by value instead of const reference (#160797)Rahul Joshi4-21/+18
SMLoc itself encapsulates just a pointer, so there is no need to pass or return it by reference.
3 days[AArch64][GlobalISel] Add support for ldexp (#160517)Ryan Cowan1-1/+1
3 days[X86] ↵Simon Pilgrim1-0/+2
canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::VPERMILPV handling (#160849) X86ISD::VPERMILPV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermilps(x,y)) -> vpermilps(freeze(x),freeze(y))
3 days[X86] ↵Simon Pilgrim1-0/+2
canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::VPERMV handling (#160845) X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))
3 days[X86] ↵Simon Pilgrim1-0/+2
canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::PSHUFB handling (#160842) X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))
3 days[TTI][RISCV] Add cost modelling for intrinsic vp.load.ff (#160470)Shih-Po Hung4-0/+35
Split out from #151300 to isolate TargetTransformInfo cost modelling for fault-only-first loads from VPlan implementation details. This change adds costing support for vp.load.ff independently of the VPlan work. For now, model a vp.load.ff as cost-equivalent to a vp.load.
3 days[X86] Set default rounding mode round to nearest for llvm.set.rounding (#160823)JaydeepChauhan142-1/+2
- Fix https://github.com/llvm/llvm-project/pull/156591#issuecomment-3335218842 - As per https://cdrdv2.intel.com/v1/dl/getContent/671200 default rounding mode is **round to nearest**.
3 days[LoongArch] Generate [x]vldi instructions with special constant splats (#159258)Zhaoxin Yang4-21/+125
3 days[ARM] Improve comment on the 'J' inline asm modifier. (#160712)Simon Tatham1-3/+3
An inline asm constraint "Jr", in AArch32, means that if the input value is a compile-time constant in the range -4095 to +4095, then it can be inserted into the assembly language as an immediate operand, and otherwise it will be placed in a register. The comment in the Arm backend said "It is not clear what this constraint is intended for". I believe the answer is that that range of immediate values are the ones you can use in a LDR or STR instruction. So it's suitable for cases like this: asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory"); in the same way that the "Ir" constraint is suitable for the immediate in a data-processing instruction such as ADD or EOR.
3 days[ARM] Remove `UnsafeFPMath` uses in code generation part (#160801)paperchalice2-6/+5
Factor out from #151275 Remove all UnsafeFPMath uses but ABI tags related part.
3 days[AMDGPU] Skip debug uses in SIInsertWaitcnts::shouldFlushVmCnt (#160818)Jay Foad1-1/+1
3 days[LoongArch] Custom legalize vector_shuffle to xvpermi.d when possible (#160429)ZhaoQi1-9/+30
3 days[LoongArch] Refine 256-bit vector_shuffle legalization for LASX (#160254)ZhaoQi1-32/+47
3 days[AMDGPU] Avoid constraining RC based on folded into operand (NFC) (#160743)Josh Hutton1-4/+9
The RC of the folded operand does not need to be constrained based on the RC of the current operand we are folding into. The purpose of this PR is to facilitate this PR: https://github.com/llvm/llvm-project/pull/151033
3 days[Mips] Fix atomic min/max generate mips4 instructions when compiling for ↵yingopq1-26/+191
mips2 (#159717) Modify instr movn/movz to mixture of beq, move, and sc. Because atomic-min-max.ll test broken on the expensive builder, I revert https://github.com/llvm/llvm-project/pull/149983 and resubmit this PR. The broken reason: In i16/i8 function expandAtomicBinOpSubword, we use two successor after loop2MBB, one does not specify the second parameter, the other use BranchProbability::getOne() that means 100% probability. This is contradictory. And the second successor is also specified incorrectly. The changess: * llvm/lib/Target/Mips/MipsExpandPseudo.cpp: Change loop2MBB`s second successor to correct one and delete the second parameter BranchProbability::getOne(). * llvm/test/CodeGen/Mips/atomic-min-max.ll: Add -verify-machineinstrs option in RUN command; Modify i16 test and i8 test according to the changes. Fix #145411.
3 days[clang][SPARC] Pass 16-aligned structs with the correct alignment in CC ↵Koakuma1-1/+2
(#155829) Pad argument registers to preserve overaligned structs in LLVM IR. Additionally, since i128 values will be lowered as split i64 pairs in the backend, correctly set the alignment of such arguments as 16 bytes. This should make clang compliant with the ABI specification and fix https://github.com/llvm/llvm-project/issues/144709.
3 days[LoongArch] Override shouldScalarizeBinop to enable ↵ZhaoQi2-0/+21
`extract(binop)->binop(extract)` combination (#159726)
3 days[LoongArch] Support vector types for hasAndNot to enable more DAG combines ↵ZhaoQi1-2/+6
(#159056) After this commit, DAGCombiner will have more opportunities to optimize vector types `and+...+not` to `andn`. Many combines in DAGCombiner will be enabled, but only shows changes after combining `and(add(not))` to `and(not(sub))` in the tests of this commit.
3 days[WebAssembly] Remove FAKE_USEs before ExplicitLocals (#160768)Heejin Ahn2-0/+18
`FAKE_USE`s are essentially no-ops, so they have to be removed before running ExplicitLocals so that `drop`s will be correctly inserted to drop those values used by the `FAKE_USE`s. --- This is reapplication of #160228, which broke Wasm waterfall. This PR additionally prevents `FAKE_USE`s uses from being stackified. Previously, a 'def' whose first use was a `FAKE_USE` was able to be stackified as `TEE`: - Before ``` Reg = INST ... // Def FAKE_USE ..., Reg, ... // Insert INST ..., Reg, ... INST ..., Reg, ... ``` - After RegStackify ``` DefReg = INST ... // Def TeeReg, Reg = TEE ... DefReg FAKE_USE ..., TeeReg, ... // Insert INST ..., Reg, ... INST ..., Reg, ... ``` And this assumes `DefReg` and `TeeReg` are stackified. But this PR removes `FAKE_USE`s in the beginning of ExplicitLocals. And later in ExplicitLocals we have a routine to unstackify registers that have no uses left: https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L257-L269 (This was added in #149626. Then it didn't seem it would trigger the same assertions for `TEE`s because it was fixing the bug where a terminator was removed in CFGSort (#149097). Details here: https://github.com/llvm/llvm-project/pull/149432#issuecomment-3091444141) - After `FAKE_USE` removal and unstackification ``` DefReg = INST ... TeeReg, Reg = TEE ... DefReg INST ..., Reg, ... INST ..., Reg, ... ``` And now `TeeReg` is unstackified. This triggered the assertion here, that `TeeReg` should be stackified: https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L316 This prevents `FAKE_USE`s' uses from being stackified altogether, including `TEE` transformation. Even when it is not a `TEE` transformation and just a single use stackification, it does not trigger the assertion but there's no point stackifying it given that it will be deleted. --- Fixes https://github.com/emscripten-core/emscripten/issues/25301.
3 days[NFC][PowerPC] Fix err in instruction class name for stxvp (#160764)Lei Huang2-7/+7
4 days[AMDGPU] Calc IsVALU correctly during UADDO/USUBO selection (#159814)LU-JOHN2-7/+14
Fix two bugs. The first bug hid the second bug. 1. Calculate IsVALU correctly during UADDO/USUBO selection. IsVALU should be false if the carryout users are UADDO_CARRY/USUBO_CARRY. However instruction selection visits uses before defs, so the UADDO_CARRY/USUBO_CARRY nodes are normally (probably always) already converted to S_ADD_CO_PSEUDO/S_SUB_CO_PSEUDO. Fix to check for these machine opcodes. 2. Without this fix, UADDO/USUBO selection will always select the VALU instructions V_ADD_CO__U32_e64/V_SUB_CO_U32_e64. S_UADDO_PSEUDO/S_USUBO_PSEUDO were never selected in the CodeGen/AMDGPU tests. Thus, S_UADDO_PSEUDO/S_USUBO_PSEUDO cases were never hit in EmitInstrWithCustomInserter. The code generation for S_UADDO_PSEUDO/S_USUBO_PSEUDO had a bug where it could not handle code generation for 32-bit $scc_out. --------- Signed-off-by: John Lu <John.Lu@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
4 days[NFC][x86] Cleanup X86FrameLowering::emitSPUpdate (#156948)Daniel Paoliello1-10/+9
I was trying to understand the cases where `emitSPUpdate` may use a `push` to adjust the stack pointer and realized that the code was more complex than it needed to be. * `Chunk` is redundant as it is set to `MaxSPChunk` and never modified. * The only time we use the `push` optimization is if `Offset` is equal to `SlotSize` (as `SlotSize` is never as large as `MaxSPChunk`, so will never equal if `std::min` returned that value). * If we use the `push` optimization, then we've finished adjusting the stack and can return early instead of continuing the loop.
4 days[llvm] Fix X86InstrInfo.cpp build after #160188Jan Svoboda1-0/+1
4 days[NVPTX] Reland `mad.wide` combine under (default off) CLI option (#160214)Justin Fargnoli3-1/+19
Users reported regressions to important matmul kernels as a result of #155024. Although #155024 was a revert, this PR should allow them to recover some of the lost performance.
4 days[llvm] Add `vfs::FileSystem` to `PassBuilder` (#160188)Jan Svoboda1-0/+2
Some LLVM passes need access to the filesystem to read configuration files and similar. In some places, this is achieved by grabbing the VFS from `PGOOptions`, but some passes don't have access to these and resort to just calling `vfs::getRealFileSystem()`. This PR allows setting the VFS directly on `PassBuilder` that's able to pass it down to all passes that need it.