aboutsummaryrefslogtreecommitdiff
path: root/llvm
AgeCommit message (Collapse)AuthorFilesLines
2022-04-13[NFC][CodeGen] Use ArrayRef in TargetLowering functionsShao-Ce SUN2-134/+100
This patch is similar to D122557, adding an `ArrayRef` version for `setOperationAction`, `setLoadExtAction`, `setCondCodeAction`, `setLibcallName`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123467
2022-04-12[AMDGPU][Codegen] Unsupported image sample texture map instructionsAnshil Gandhi4-9/+157
Disables image_sample_*_g16 instructions on architectures lacking g16 support. This patch fixes the issue 54672. Differential Revision: https://reviews.llvm.org/D123461
2022-04-12[SimplifyCFG] cleanup code for converting switch to select (NFC)Sanjay Patel1-33/+33
This renames functions for more general usage (and current capitalization style) before a proposed logic change in D122485. Differential Revision: https://reviews.llvm.org/D123614
2022-04-12[AArch64] Async unwind - function epiloguesMomchil Velikov50-296/+1914
Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330
2022-04-12[AMDGPU] Use default member initializers in Subtarget classesJay Foad5-275/+137
Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp. This was mostly autogenerated by: clang-tidy -checks=-*,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/*Subtarget.cpp Differential Revision: https://reviews.llvm.org/D123613
2022-04-12[gn build] Fix a URL in a commentNico Weber1-1/+1
2022-04-12[InstSimplify] Don't fold phi of poison and trapping const expr (PR49839)Nikita Popov2-5/+13
Folding this case would result in the constant expression being executed unconditionally, which may introduce a new trap. Fixes https://github.com/llvm/llvm-project/issues/49839.
2022-04-12[InstSimplify] Add test for PR49839 (NFC)Nikita Popov1-0/+42
2022-04-12[AMDGPU] Split unaligned 3 DWORD DS operationsStanislav Mekhanoshin4-31/+18
I have written a minitest to check the performance. Overall the benefit of aligned b96 operations on data which is not known but happens to be aligned is small, while performance hit of using b96 operations on a really unaligned memory is high. The only exception is when data is not aligned even by 4, it is better to use b96 in this case. Here is the test output on Vega and Navi: ``` Using platform: AMD Accelerated Parallel Processing Using device: gfx900:xnack- ds_write_b96 aligned: 3.4 sec ds_write_b32 + ds_write_b64 aligned: 4.5 sec ds_write_b32 * 3 aligned: 4.8 sec ds_write_b96 misaligned by 1: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 1: 7.2 sec ds_write_b32 * 3 misaligned by 1: 10.0 sec ds_write_b96 misaligned by 2: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 2: 7.2 sec ds_write_b32 * 3 misaligned by 2: 10.1 sec ds_write_b96 misaligned by 4: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 4: 4.2 sec ds_write_b32 * 3 misaligned by 4: 4.9 sec ds_write_b96 misaligned by 8: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 8: 4.6 sec ds_write_b32 * 3 misaligned by 8: 4.9 sec ds_read_b96 aligned: 3.3 sec ds_read_b32 + ds_read_b64 aligned: 4.9 sec ds_read_b32 * 3 aligned: 2.6 sec ds_read_b96 misaligned by 1: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 1: 7.2 sec ds_read_b32 * 3 misaligned by 1: 10.1 sec ds_read_b96 misaligned by 2: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 2: 7.2 sec ds_read_b32 * 3 misaligned by 2: 10.1 sec ds_read_b96 misaligned by 4: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 4: 2.6 sec ds_read_b32 * 3 misaligned by 4: 2.6 sec ds_read_b96 misaligned by 8: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 8: 4.9 sec ds_read_b32 * 3 misaligned by 8: 2.6 sec Using platform: AMD Accelerated Parallel Processing Using device: gfx1030 ds_write_b96 aligned: 4.1 sec ds_write_b32 + ds_write_b64 aligned: 13.0 sec ds_write_b32 * 3 aligned: 4.5 sec ds_write_b96 misaligned by 1: 12.5 sec ds_write_b32 + ds_write_b64 misaligned by 1: 22.0 sec ds_write_b32 * 3 misaligned by 1: 31.5 sec ds_write_b96 misaligned by 2: 12.4 sec ds_write_b32 + ds_write_b64 misaligned by 2: 22.0 sec ds_write_b32 * 3 misaligned by 2: 31.5 sec ds_write_b96 misaligned by 4: 12.4 sec ds_write_b32 + ds_write_b64 misaligned by 4: 4.0 sec ds_write_b32 * 3 misaligned by 4: 4.5 sec ds_write_b96 misaligned by 8: 12.4 sec ds_write_b32 + ds_write_b64 misaligned by 8: 13.0 sec ds_write_b32 * 3 misaligned by 8: 4.5 sec ds_read_b96 aligned: 3.8 sec ds_read_b32 + ds_read_b64 aligned: 12.8 sec ds_read_b32 * 3 aligned: 4.4 sec ds_read_b96 misaligned by 1: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 1: 21.8 sec ds_read_b32 * 3 misaligned by 1: 31.5 sec ds_read_b96 misaligned by 2: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 2: 21.9 sec ds_read_b32 * 3 misaligned by 2: 31.5 sec ds_read_b96 misaligned by 4: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 4: 3.8 sec ds_read_b32 * 3 misaligned by 4: 4.5 sec ds_read_b96 misaligned by 8: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 8: 12.8 sec ds_read_b32 * 3 misaligned by 8: 4.5 sec ``` Fixes: SWDEV-330802 Differential Revision: https://reviews.llvm.org/D123524
2022-04-12[AMDGPU] Refactor LDS alignment checks.Stanislav Mekhanoshin3-103/+77
Move features/bugs checks into the single place allowsMisalignedMemoryAccessesImpl. This is mostly NFCI except for the order of selection in couple places. A separate change may be needed to stop lying about Fast. Differential Revision: https://reviews.llvm.org/D123343
2022-04-12[X86] getFauxShuffleMask - remove use DemandedElts TODOSimon Pilgrim1-1/+0
Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask
2022-04-12[ValueTracking] Make getStringLenth aware of strdupserge-sans-paille5-26/+60
During strlen compile-time evaluation, make it possible to track size of strduped strings. Differential Revision: https://reviews.llvm.org/D123497
2022-04-12[AMDGPU][DOC][NFC] Updated GFX10 assembler syntax descriptionDmitry Preobrazhensky97-2217/+2228
The description has been updated to reflect AMDGPU MC changes: - enabled literals for src0 of v_fmaak_f*, v_fmamk_f*, v_madak_f32, v_madmk_f32; - enabled global_atomic_fcmpswap and global_atomic_fcmpswap_x2; - enabled dlc with flat_atomic* and global_atomic_*. Bug fixing and improvements: - enabled s_wait_idle; - enabled s_waitcnt_depctr; - added description of s_waitcnt_depctr syntactic sugar; - disabled SYSMSG_OP_HOST_TRAP_ACK (it is not supported on GFX10); - corrected description of lgkmcnt (accept values from 0 to 63).
2022-04-12[AMDGPU][DOC][NFC] Updated GFX1030 assembler syntax descriptionDmitry Preobrazhensky16-1039/+1062
Summary of changes: - enabled null for VOP operands; - added description of s_waitcnt_depctr syntactic sugar.
2022-04-12[DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) foldsSimon Pilgrim2-60/+37
2022-04-12[X86] Fix extact -> exact typo in test namesSimon Pilgrim1-16/+16
2022-04-12[gn build] Port 95f0f69f1ff8LLVM GN Syncbot1-1/+0
2022-04-12[InlineCost] Check that function types matchNikita Popov2-3/+30
Retain the behavior we get without opaque pointers: A call to a known function with different function type is considered an indirect call. This fixes the crash reported in https://reviews.llvm.org/D123300#3444772.
2022-04-12[gn build] Port 5a5be4044f0bLLVM GN Syncbot1-0/+1
2022-04-12workflow: When updating the issueXX branch, use force pushTobias Hieta1-1/+1
Otherwise if you try to update the branch with a new /cherry-pick from the same issue you will run into problems similar as to the one shown in this workflow: https://github.com/llvm/llvm-project/runs/5864672298?check_suite_focus=true Reviewed By: tstellar Differential Revision: https://reviews.llvm.org/D123365
2022-04-12[llvm-pdbutil] Fix broken '-modi' option after change D122226.Carlos Alberto Enciso5-6/+48
The change described by: https://reviews.llvm.org/D122226 Moved some llvm-pdbutil functionality to the debug PDB library. This patch addresses a broken '-modi' argument handling, which causes an assertion if its value is other than '0' or '1'. In addition, it moves the assertion for the number of occurrences of the '-modi' argument from the PDB library into the llvm-pdbutil driver. Reviewed By: zequanwu Differential Revision: https://reviews.llvm.org/D123483
2022-04-12[AMDGPU] Graceful abort for waterfalls in SIOptimizeVGPRLiveRangeCarl Ritson2-3/+129
If the CFG structure of a waterfall loop is not the expected shape then gracefully abort traversing the IR for the given loop. This applies to nest waterfall loops which are not supported by the VGPR live range optimizer. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D123480
2022-04-12[AMDGPU] Pre-commit test for D123569. NFC.Carl Ritson1-0/+81
2022-04-12[InstCombine] fold more constant remainder to select-of-constants remainderLiqin Weng2-11/+8
Reviewed By: xbolva00, spatel, Chenbing.Zheng Differential Revision: https://reviews.llvm.org/D123486
2022-04-12[InstCombine] Fold icmp(X) ? f(X) : CAlexander Shaposhnikov2-10/+38
This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate. This addresses the issue https://github.com/llvm/llvm-project/issues/54089. Differential revision: https://reviews.llvm.org/D123159 Test plan: make check-all
2022-04-12[InstCombine][NFC] Add baseline tests for folds icmp(X) ? f(X) : CAlexander Shaposhnikov1-0/+65
Differential revision: https://reviews.llvm.org/D123430 Test plan: make check-all
2022-04-11[SelectionDAG] Remove unecessary null check after call to getNode. NFCCraig Topper1-3/+2
As far as I know getNode will never return a null SDValue. I'm guessing this was modeled after the FoldConstantArithmetic call earlier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123550
2022-04-11GlobalISel: Verify atomic load/store ordering restrictionMatt Arsenault4-2/+28
Reject acquire stores and release loads. This matches the restriction imposed by the LLParser and IR verifier.
2022-04-11AArch64/GlobalISel: Regenerate mir test checksMatt Arsenault56-4168/+4753
Minimizes the test diffs in future changes from introduction of -NEXT.
2022-04-11[gn build] Port 203a1e36ed75LLVM GN Syncbot1-1/+0
2022-04-11GlobalISel: Add memSizeNotByteSizePow2 legality helperMatt Arsenault5-8/+30
This is really a replacement for memSizeInBytesNotPow2 that actually does what most every target wants. In particular, since s1 rounds to 1 byte, it wasn't lowered by this predicate. This results in targets needing to think harder and add more matchers to catch all the degenerate cases. Also small bug fix that prevented the correct insertion of G_ASSERT_ZEXT in the AArch64 use case.
2022-04-11GlobalISel: Implement computeKnownBits for overflow bool resultsMatt Arsenault20-2549/+1465
2022-04-11AMDGPU/GlobalISel: Add some additional IR tests for zextloadMatt Arsenault1-0/+115
2022-04-11AMDGPU/GlobalISel: Add more tests for inreg extend + load combineMatt Arsenault2-1/+354
2022-04-11Mips/GlobalISel: Remove test IR sections and regenerate checksMatt Arsenault4-169/+153
2022-04-11AArch64/GlobalISel: Remove IR section from a testMatt Arsenault1-75/+35
2022-04-11AMDGPU/GlobalISel: Remove unused parameterMatt Arsenault3-38/+30
2022-04-11Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"Matt Arsenault7-108/+2
This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d. The unrelated failure this exposed was fixed.
2022-04-11AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5Changpeng Fang4-99/+72
Summary: In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force this alignment through the first implicit argument. In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used. Reviewers: arsenm, b-sumner Differential Revision: https://reviews.llvm.org/D123346
2022-04-11[VFS] RedirectingFileSystem only replace path if not already mappedBen Barham3-12/+52
If the `ExternalFS` has already remapped to an external path then `RedirectingFileSystem` should not change it to the originally provided path. This fixes the original path always being used if multiple VFS overlays were provided and the path wasn't found in the highest (ie. first in the chain). For now this is accomplished through the use of a new `ExposesExternalVFSPath` field on `vfs::Status`. This flag is true when the `Status` has an external path that's different from its virtual path, ie. the contained path is the external path. See the plan in `FileManager::getFileRef` for where this is going - eventually we won't need `IsVFSMapped` any more and all returned paths should be virtual. Resolves rdar://90578880 and llvm-project#53306. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D123398
2022-04-11[CMake][gn][Bazel] Remove HAVE_PTHREAD_GETSPECIFICFangrui Song3-7/+0
The only user was removed by d351f54a076edf24c2a2bfda7cc7e3313ee3eecf.
2022-04-11[RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of ↵Craig Topper6-27/+35
phis on RV64. Materializing constants on RISCV is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow RISCV to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122951
2022-04-11[Support] Remove unused/uncompilable !HAVE_PTHREAD_GETSPECIFIC code pathFangrui Song1-12/+0
lib/Support/ThreadLocal.cpp has been uncompilable since rL158346 (2012-06) when `data` became a char array. The error looks like ``` ...llvm/lib/Support/Unix/ThreadLocal.inc:66:57: error: array type 'char[8]' is not assignable void ThreadLocalImpl::setInstance(const void* d) { data = const_cast<void*>(d);} ```
2022-04-11Value::isTransitiveUsedByMetadataOnly: Don't repeatedly add an element to ↵Fangrui Song1-7/+3
the worklist. NFC
2022-04-11[test] Remove references to -fexperimental-new-pass-manager in testsArthur Eubanks12-12/+12
This has been the default for a while and we're in the process of removing the legacy PM optimization pipeline.
2022-04-11AArch64 adding more tests to show the simple scenarios for or/and combineBiplob Mishra1-0/+68
2022-04-11[InstCombine] guard against splat-mul corner caseSanjay Patel2-2/+13
The test is already simplified, and I'm not sure how to write a test to exercise the new clause. But it protects the 2-bit pattern from miscompiling as noted in D123453. https://alive2.llvm.org/ce/z/QPyVfv (If we managed to fall into the mul transform, it would wrongly create a zero on this pattern.)
2022-04-11[Driver] Simplify hasFlag pattern with addOptInFlag/addOptOutFlag helpersFangrui Song2-0/+17
Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D123468
2022-04-11AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniformNicolai Hähnle3-26/+13
The AMDGPUISD::SETCC node is like ISD::SETCC, but returns a lane mask instead of a per-lane boolean. The lane mask is uniform. This improves instruction selection for code patterns like ctpop(ballot(x)), which can now use an S_BCNT1_* instruction instead of V_BCNT_*. GlobalISel already selects scalar instructions (an earlier commit added a test case).. Differential Revision: https://reviews.llvm.org/D123432
2022-04-11[LoopUnroll] Always respect user unroll pragmaWhitney Tsang2-41/+12
IMO when user provide unroll pragma, compiler should always respect it. It is not clear to me why loop unroll pass currently ensure that the unrolled loop size is limited by PragmaUnrollThreshold. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119148