aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
98 min.[SPIRV] Added opencl Pipe builtins (#135335)HEADmainEbin-McW3-1/+107
- Added opencl Pipe builtins - Pipe instructions were added in tablegen and lowered in SPIRVBuiltins.cpp --------- Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com> Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
100 min.[SPIRV] Added constraint for SPV_INTEL_bindless_image extension (#160249)Ebin-McW1-3/+34
Added constraints related to Addressing model as specified in the specification. It conforms with the implementation in translator Same as PR #160089 Solved all issues
102 min.[SPIRV] Add support for the extension ↵Ebin-McW2-1/+35
SPV_EXT_relaxed_printf_string_address_space (#160245) Added support for the extension to support more storageclass for printf strings.
107 min.[SPIRV] Added lowering for the debugtrap intrinsic (#157442)Subash B1-1/+13
Mapped llvm.debugtrap intrinsic to OpNop in the SPIR-V backend, since SPIR-V has no direct equivalent with tests.
110 min.[SPIRV] Added support for the constrained comparison intrinsics (#157439)Subash B1-0/+24
Added SPIR-V support for constrained floating-point comparison intrinsics (fcmp, fcmps) with lowering and tests.
3 hours[SPIRV] Fix type mismatch assertion in insertvalue. (#143131)Tim Besard1-5/+6
The code was incorrectly converting all `undef` arguments to `i32`, while the `spv_insertv` intrinsics only expects that for the first operand, representing the aggregate type. Fixes https://github.com/llvm/llvm-project/issues/127977 --------- Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com>
5 hours[VPlan] Move using VPlanPatternMatch to top in VPlanUtils.cpp (NFC).Florian Hahn1-3/+1
Only VPlan pattern matching is used in the file, move the using statement to the top level.
5 hours[LoongArch] Add patterns to support `[x]vadda.{b/h/w/d}` generation (#160674)ZhaoQi3-0/+34
This commit add patterns for lsx and lasx to support generating `[x]vadda.{b/h/w/d}` instructions. Note: For convenience, this commit also set `ISD::ABS` as legal. As shown in the tests, this brings no change to the results, just same as the results obtained from expanding it before. But, setting it as legal brings more vectorization opportunities to IR transformation which may bring more vector optimization chances for later stages and the backend.
6 hours[LV] Clarify nature of legacy CSE (NFC) (#160855)Ramkumar Ramachandra1-3/+4
In order to avoid conflating the legacy CSE with the VPlan-based one, rename the legacy CSE and insert a FIXME to clarify the nature of the legacy CSE.
9 hours[ARM] Remove `UnsafeFPMath` uses (#151275)paperchalice1-2/+17
Try to remove `UnsafeFPMath` uses in arm backend. These global flags block some improvements like https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797. Remove them incrementally.
12 hours[MIPS][float] Fixed SingleFloat codegen on N32/N64 targets (#140575)Davide Mor4-23/+72
This patch aims at making the combination of single-float and N32/N64 ABI properly work. Right now when both options are enabled the compiler chooses an incorrect ABI and in some cases even generates wrong instructions. The floating point behavior on MIPS is controlled through 3 flags: soft-float, single-float, fp64. This makes things complicated because fp64 indicates the presence of 64bit floating point registers, but cannot be easily disabled (the mips3 feature require it, but mips3 CPUs with only 32bit floating point exist). Also if fp64 is missing it doesn't actually disable 64bit floating point operations, because certain MIPS1/2 CPUs support 64bit floating point with 32bit registers, hence the single-float option. I'm guessing that originally single-float was only intended for the latter case, and that's the reason why it doesn't properly work on 64bit targets. So this patch does the following: - Make single-float a "master disable", even if fp64 is enabled this should completely disable generation of 64bit floating point operations, making it available on targets which hard require fp64. - Add proper calling conventions for N32/N64 single-float combinations. - Fixup codegen to not generate certain 64bit floating point operations, apparently not assigning a register class to f64 values is not enough to prevent them from showing up. - Add tests for the new calling conventions and codegen.
12 hours[SDAG] Constant fold frexp in signed way (#161015)Hongyu Chen1-2/+2
Fixes #160981 The exponential part of a floating-point number is signed. This patch prevents treating it as unsigned.
18 hours[VPlan] Allow multiple users of (broadcast %evl).Florian Hahn1-1/+2
CSE may replace multiple redundant broadcasts of EVL with a single broadcast which may have more than 1 user. Adjust the verifier to allow this. Fixes a crash when building llvm-test-suite with EVL: https://lab.llvm.org/buildbot/#/builders/210/builds/3303
19 hours[VPlan] Mark VPInstruction::Broadcast as not reading/writing memory.Florian Hahn1-0/+1
This enables additional DCE/CSE opportunities and ensures that we don't end up with multiple redundant users of a VPInstruction using EVL. It fixes a verifier error in the added test_3_inductions test.
22 hours[X86] matchVPMADD52 - only use 512-bit MADD52 on AVX512IFMA targets (#161011)Simon Pilgrim1-3/+5
If we have a AVX512 target capable of AVXIFMA but not AVX512IFMA then we must split 512-bit (or larger) types to 256-bits Fixes #160928
22 hours[Support] Deprecate one form of support::endian::read (NFC) (#160979)Kazu Hirata3-8/+8
This is a follow-up to #156140, which deprecated one form of write. We have two forms of read: template <typename value_type, std::size_t alignment> [[nodiscard]] inline value_type read(const void *memory, endianness endian) template <typename value_type, endianness endian, std::size_t alignment> [[nodiscard]] inline value_type read(const void *memory) The difference is that endian is a function parameter in the former but a template parameter in the latter. This patch streamlines the code by migrating the use of the latter to the former while deprecating the latter.
23 hours[ARM] Generate build-attributes more correctly in the presence of intrinsic ↵David Green1-8/+11
declarations. (#160749) This code doesn't work very well, but this makes it work when intrinsic definitions are present. It now discounts functions declarations from the set of attributes it looks at. The code would have worked better before 0ab5b5b8581d9f2951575f7245824e6e4fc57dec when module-level attributes could provide the information used to construct build-attributes.
29 hours[LAA] Revert 56a1cbb and 1aded51, due to crash (#160993)Ramkumar Ramachandra1-34/+22
This reverts commits 56a1cbb ([LAA] Fix non-NFC parts of 1aded51), 1aded51 ([LAA] Prepare to handle diff type sizes (NFC)). The original NFC patch caused some regressions, which the later patch tried to fix. However, the later patch is the cause of some crashes, and it would be best to revert both for now, and re-land after thorough testing.
34 hours[InstCombine] Rotate transformation port from SelectionDAG to InstCombine ↵Axel Sorenson1-0/+16
(#160628) The rotate transformation from https://github.com/llvm/llvm-project/blob/72c04bb882ad70230bce309c3013d9cc2c99e9a7/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10312-L10337 has no middle-end equivalent in InstCombine. The following is a port of that transformation to InstCombine. --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
38 hoursRevert "[TTI][RISCV] Add cost modelling for intrinsic vp.load.ff (#160470)"ShihPo Hung5-44/+0
This reverts commit aa08b1a9963f33ded658d3ee655429e1121b5212.
39 hoursAMDGPU: Check if immediate is legal for av_mov_b32_imm_pseudo (#160819)Matt Arsenault1-0/+9
This is primarily to avoid folding a frame index materialized into an SGPR into the pseudo; this would end up looking like: %sreg = s_mov_b32 %stack.0 %av_32 = av_mov_b32_imm_pseudo %sreg Which is not useful. Match the check used for the b64 case. This is limited to the pseudo to avoid regression due to gfx908's special case - it is expecting to pass here with v_accvgpr_write_b32 for illegal cases, and stay in the intermediate state with an sgpr input. This avoids regressions in a future patch.
40 hours[llvm][mustache] Avoid excessive hash lookups in EscapeStringStream (#160166)Paul Kirth1-8/+22
The naive char-by-char lookup performed OK, but we can skip ahead to the next match, avoiding all the extra hash lookups in the key map. Likely there is a faster method than this, but its already a 42% win in the BM_Mustache_StringRendering/Escaped benchmark, and an order of magnitude improvement for BM_Mustache_LargeOutputString. | Benchmark | Before (ns) | After (ns) | Speedup | | :--- | ---: | ---: | ---: | | `StringRendering/Escaped` | 29,440,922 | 16,583,603 | ~44% | | `LargeOutputString` | 15,139,251 | 929,891 | ~94% | | `HugeArrayIteration` | 102,148,245 | 95,943,960 | ~6% | | `PartialsRendering` | 308,330,014 | 303,556,563 | ~1.6% | Unreported benchmarks, like those for parsing, had no significant change.
41 hours[profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (#159645)Mircea Trofin1-11/+75
Propagate `!prof`​ from `switch`​ instructions. Issue #147390
42 hours[NVPTX] legalize v2i32 to improve compatibility with v2f32 (#153478)Princeton Ferro5-82/+124
Since v2f32 is legal but v2i32 is not, this causes some sequences of operations like bitcast (build_vector) to be lowered inefficiently.
43 hours[clang] Use the VFS to create the OpenMP region entry ID (#160918)Jan Svoboda1-6/+8
This PR uses the VFS to create the OpenMP target entry instead of going straight to the real file system. This matches the behavior of other input files of the compiler.
44 hours[ASan][RISCV] Teach AddressSanitizer to support indexed load/store. (#160443)Hank Chang2-0/+57
This patch is based on https://github.com/llvm/llvm-project/pull/159713 This patch extends AddressSanitizer to support indexed/segment instructions in RVV. It enables proper instrumentation for these memory operations. A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to describe the offset between the base pointer and the actual memory reference address. Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
44 hours[AMDGPU][True16][CodeGen] Avoid setting hi part in copysign (#160891)Piotr Sobczak1-2/+3
This is a temporary fix for a regression from #154875. The new pattern sets the hi part of V_BFI result and that confuses si-fix-sgpr-copies - where the proper fix is likely to be. During si-fix-sgpr-copies, an incorrect fold happens: %86:vgpr_32 = V_BFI_B32_e64 %87:sreg_32 = COPY %86.hi16:vgpr_32 %95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, killed %87:sreg_32, 0, %63:vgpr_16, 0, 0 into %86:vgpr_32 = V_BFI_B32_e64 %95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, %86.lo16:vgpr_32, 0, %63:vgpr_16, 0, 0 Fixes: Vulkan CTS dEQP-VK.glsl.builtin.precision_fp16_storage32b.*.
44 hours[LLVM][M68k] Fix build failure caused by #160797 (#160926)Rahul Joshi1-7/+4
Fix M68k build failures caused by https://github.com/llvm/llvm-project/pull/160797
44 hours[RISCV] Update SiFive7's scheduling models with their optimizations on ↵Min-Yih Hsu1-10/+94
permutation instructions (#160763) In newer SiFIve7 cores like X390, permutation instructions like vrgather.vv operates on LMUL smaller than a single DLEN could yield a constant cycle. For slightly larger data that fits in the constraint of `log2(SEW/8) + log2(LMUL) <= log2(DLEN / 32)`, these instructions can also yield cycles that are proportional to the quadratic of LMUL, rather than being proportional to VL. Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
45 hours[AMDGPU] Ensure divergence for v_alignbit (#129159)Jeffrey Byrnes1-7/+7
Selecting vgpr for the uniform version of this pattern may lead to unnecessary vgpr and waterfall loops.
45 hours[DirectX] Validating Root flags are denying shader stage (#160919)joaosaffran1-12/+60
Root Signature Flags, allow flags to block compilation of certain shader stages. This PR implements a validation and notify the user if they compile a root signature that is denying such shader stage. Closes: https://github.com/llvm/llvm-project/issues/153062 Previously approved: https://github.com/llvm/llvm-project/pull/153287 --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com> Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com> Co-authored-by: Joao Saffran <jderezende@microsoft.com>
45 hours[Intrinsic] Unify IIT_STRUCT{2-9} into ITT_STRUCT to support upto 257 return ↵Michael Liao1-23/+3
values - Currently, Intrinsic can only have up to 9 return values. In case new intrinsics require more than 9 return values, additional ITT_STRUCTxxx values need to be added to support > 9 return values. Instead, this patch unifies them into a single IIT_STRUCT followed by a BYTE specifying the minimal 2 (encoded as 0) and maximal 257 (encoded as 255) return values.
45 hours[DirectX] Updating DXContainer logic to read version 1.2 of static samplers ↵joaosaffran2-4/+12
(#160184) This PR is updating `Object/DXContainer.h` so that we can read data from root signature version 1.2, which adds flags into static samplers.
46 hours[SYCL] Add offload wrapping for SYCL kind (#147508)Maksim Sabianin2-5/+416
This patch adds an Offload Wrapper for the SYCL kind. This is an essential step for SYCL offloading and the compilation flow. The usage of offload wrapping is added to the clang-linker-wrapper tool. Modifications: Implemented `bundleSYCL()` function to handle SYCL image bundling. Implemented `wrapSYCLBinaries()` function that is invoked from clang-linker-wrapper. SYCL Offload Wrapping uses specific data structures such as `__sycl.tgt_device_image` and `__sycl.tgt_bin_desc`. Each SYCL image maintains its own symbol table (unlike shared global tables in other targets). Therefore, symbols are encoded explicitly during the offload wrapping. Also, images refer to their own Offloading Entries arrays unlike other targets. The proposed `__sycl.tgt_device_image` uses Version 3 to differentiate from images generated by Intel DPC++. The structure proposed in this patch doesn't have fields deprecated in DPC++.
46 hours[NFC][PowerPC] Consolidate predicate definitions into PPC.td (#160579)Lei Huang7-71/+66
Consolidate predicate definitions into top level entry point for PowerPC target `PPC.td` and remove duplicate definitions for 32/64 bit sub-target checks.
46 hours[DirectX] Updating DXContainer Yaml to represent Root Signature 1.2 (#159659)joaosaffran6-4/+60
This PR updates the YAML representation of DXContainer to support Root Signature 1.2, this also requires updating the write logic to support testing.
47 hours[DirectX] Adding missing descriptor table validations (#153276)joaosaffran2-3/+70
This patch adds 2 small validation to DirectX backend. First, it checks if registers in descriptor tables are not overflowing, meaning they don't try to bind registers over the maximum allowed value, this is checked both on the offset and on the number of descriptors inside the range; second, it checks if samplers are being mixed with other resource types. Closes: #153057, #153058 --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com> Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com> Co-authored-by: Joao Saffran <jderezende@microsoft.com>
47 hours[NFC][LLVM] Pass/return SMLoc by value instead of const reference (#160797)Rahul Joshi5-22/+19
SMLoc itself encapsulates just a pointer, so there is no need to pass or return it by reference.
47 hours[NFC][LLVM] Use Unix line endings for a few source files (#160794)Rahul Joshi2-541/+541
47 hours[NFC][LLVM] Fix line endings for DXILABI.cpp (#160791)Rahul Joshi1-33/+33
Fix line ending to Unix style by running dos2unix on this file.
47 hoursPeepholeOpt: Use initializer list (#160898)Matt Arsenault1-2/+1
47 hours[llvm][clang] Use the VFS in `FileCollector` (#160788)Jan Svoboda1-5/+6
This PR changes `llvm::FileCollector` to use the `llvm::vfs::FileSystem` API for making file paths absolute instead of using `llvm::sys::fs::make_absolute()` directly. This matches the behavior of the compiler on most other input files.
47 hoursGreedy: Make trySplitAroundHintReg try to match hints with subreg copies ↵Matt Arsenault1-12/+28
(#160294) This is essentially the same patch as 116ca9522e89f1e4e02676b5bbe505e80c4d4933; when trying to match a physreg hint, try to find a compatible physreg if there is a subregister copy. This has the slight difference of using getSubReg on the hint instead of getMatchingSuperReg (the other use should also use getSubReg instead, it's faster). At the moment this turns out to have very little effect. The adjacent code needs better handling of subregisters, so continue adding this piecemeal. The X86 test shows a net reduction in real instructions, plus a few new kills.
48 hoursRevert "[RegAlloc] Strengthen asserts in LiveRangeEdit::scanRemattable ↵Philip Reames1-3/+3
[nfc]" (#160897) Reverts llvm/llvm-project#160765. Failures on buildbot indicate second assertion does not in fact hold.
48 hours[RegAlloc] Add printer and dump for VNInfo [nfc] (#160758)Philip Reames1-12/+18
Uses the existing format of the LiveRange printer, and just factors it out so that you can do vni->dump() when debugging, or log a vni in a debug print statement.
2 days[AArch64][GlobalISel] Add support for ldexp (#160517)Ryan Cowan1-1/+1
2 days[RegAlloc] Strengthen asserts in LiveRangeEdit::scanRemattable [nfc] (#160765)Philip Reames1-3/+3
We should always be able to find the VNInfo in the original live interval which corresponds to the subset we're trying to spill, and the only cases where we have a VNInfo without a definition instruction are if the vni is unused, or corresponds to a phi. Adjust the code structure to explicitly check for PHIDef, and assert the stronger conditions.
2 days[RegAlloc] Add additional tracing in InlineSpiller::rematerializeFor (#160761)Philip Reames1-2/+11
We didn't have trace logging for two cases in this routine which makes it sometimes hard to tell what is going on. In addition to debug trace statements, add comments to explain the logic behind the early exits which don't mark the virtual register live. Suggestions on how to word these more precisely very welcome; I'm not clear I understand all the intrinicies of this code myself.
2 days[CodeGen] Adjust global-split remat heuristic to match LICM (#160709)Philip Reames1-1/+2
This heuristic was originally added in 40c4aa with the stated purpose of avoiding global split on live long ranges created by MachineLICM hoisting trivially rematerializable instructions. In the meantime, various backends have introduced non-trivial rematerialization cases, MachineLICM gained an explicitly triviality check, and we've reworked our APIs to match naming wise. Let's move this heuristic back to truely trivial remat only. This is a functional change, though somewhat hard to hit. This change will cause non-trivially rematerializable instructions to be globally split more often. This is likely a good thing since non-trivial remat may not be legal at all possible points in the live interval, but may cost slightly more compile time. I don't have a motivating example; I found it when reviewing the callers of isRemMaterializable(MI).
2 days[Flang][OpenMP] Enable no-loop kernels (#155818)Dominik Adamski1-11/+12
Enable the generation of no-loop kernels for Fortran OpenMP code. target teams distribute parallel do pragmas can be promoted to no-loop kernels if the user adds the -fopenmp-assume-teams-oversubscription and -fopenmp-assume-threads-oversubscription flags. If the OpenMP kernel contains reduction or num_teams clauses, it is not promoted to no-loop mode. The global OpenMP device RTL oversubscription flags no longer force no-loop code generation for Fortran.