rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2022-04-13	[NFC][CodeGen] Use ArrayRef in TargetLowering functions	Shao-Ce SUN	2	-134/+100
	This patch is similar to D122557, adding an `ArrayRef` version for `setOperationAction`, `setLoadExtAction`, `setCondCodeAction`, `setLibcallName`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123467
2022-04-12	[AMDGPU][Codegen] Unsupported image sample texture map instructions	Anshil Gandhi	4	-9/+157
	Disables image_sample_*_g16 instructions on architectures lacking g16 support. This patch fixes the issue 54672. Differential Revision: https://reviews.llvm.org/D123461
2022-04-12	[SimplifyCFG] cleanup code for converting switch to select (NFC)	Sanjay Patel	1	-33/+33
	This renames functions for more general usage (and current capitalization style) before a proposed logic change in D122485. Differential Revision: https://reviews.llvm.org/D123614
2022-04-12	[AArch64] Async unwind - function epilogues	Momchil Velikov	50	-296/+1914
	Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330
2022-04-12	[AMDGPU] Use default member initializers in Subtarget classes	Jay Foad	5	-275/+137
	Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp. This was mostly autogenerated by: clang-tidy -checks=-,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/Subtarget.cpp Differential Revision: https://reviews.llvm.org/D123613
2022-04-12	[gn build] Fix a URL in a comment	Nico Weber	1	-1/+1

2022-04-12	[InstSimplify] Don't fold phi of poison and trapping const expr (PR49839)	Nikita Popov	2	-5/+13
	Folding this case would result in the constant expression being executed unconditionally, which may introduce a new trap. Fixes https://github.com/llvm/llvm-project/issues/49839.
2022-04-12	[InstSimplify] Add test for PR49839 (NFC)	Nikita Popov	1	-0/+42

2022-04-12	[AMDGPU] Split unaligned 3 DWORD DS operations	Stanislav Mekhanoshin	4	-31/+18
	I have written a minitest to check the performance. Overall the benefit of aligned b96 operations on data which is not known but happens to be aligned is small, while performance hit of using b96 operations on a really unaligned memory is high. The only exception is when data is not aligned even by 4, it is better to use b96 in this case. Here is the test output on Vega and Navi: ``` Using platform: AMD Accelerated Parallel Processing Using device: gfx900:xnack- ds_write_b96 aligned: 3.4 sec ds_write_b32 + ds_write_b64 aligned: 4.5 sec ds_write_b32 * 3 aligned: 4.8 sec ds_write_b96 misaligned by 1: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 1: 7.2 sec ds_write_b32 * 3 misaligned by 1: 10.0 sec ds_write_b96 misaligned by 2: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 2: 7.2 sec ds_write_b32 * 3 misaligned by 2: 10.1 sec ds_write_b96 misaligned by 4: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 4: 4.2 sec ds_write_b32 * 3 misaligned by 4: 4.9 sec ds_write_b96 misaligned by 8: 4.8 sec ds_write_b32 + ds_write_b64 misaligned by 8: 4.6 sec ds_write_b32 * 3 misaligned by 8: 4.9 sec ds_read_b96 aligned: 3.3 sec ds_read_b32 + ds_read_b64 aligned: 4.9 sec ds_read_b32 * 3 aligned: 2.6 sec ds_read_b96 misaligned by 1: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 1: 7.2 sec ds_read_b32 * 3 misaligned by 1: 10.1 sec ds_read_b96 misaligned by 2: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 2: 7.2 sec ds_read_b32 * 3 misaligned by 2: 10.1 sec ds_read_b96 misaligned by 4: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 4: 2.6 sec ds_read_b32 * 3 misaligned by 4: 2.6 sec ds_read_b96 misaligned by 8: 4.1 sec ds_read_b32 + ds_read_b64 misaligned by 8: 4.9 sec ds_read_b32 * 3 misaligned by 8: 2.6 sec Using platform: AMD Accelerated Parallel Processing Using device: gfx1030 ds_write_b96 aligned: 4.1 sec ds_write_b32 + ds_write_b64 aligned: 13.0 sec ds_write_b32 * 3 aligned: 4.5 sec ds_write_b96 misaligned by 1: 12.5 sec ds_write_b32 + ds_write_b64 misaligned by 1: 22.0 sec ds_write_b32 * 3 misaligned by 1: 31.5 sec ds_write_b96 misaligned by 2: 12.4 sec ds_write_b32 + ds_write_b64 misaligned by 2: 22.0 sec ds_write_b32 * 3 misaligned by 2: 31.5 sec ds_write_b96 misaligned by 4: 12.4 sec ds_write_b32 + ds_write_b64 misaligned by 4: 4.0 sec ds_write_b32 * 3 misaligned by 4: 4.5 sec ds_write_b96 misaligned by 8: 12.4 sec ds_write_b32 + ds_write_b64 misaligned by 8: 13.0 sec ds_write_b32 * 3 misaligned by 8: 4.5 sec ds_read_b96 aligned: 3.8 sec ds_read_b32 + ds_read_b64 aligned: 12.8 sec ds_read_b32 * 3 aligned: 4.4 sec ds_read_b96 misaligned by 1: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 1: 21.8 sec ds_read_b32 * 3 misaligned by 1: 31.5 sec ds_read_b96 misaligned by 2: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 2: 21.9 sec ds_read_b32 * 3 misaligned by 2: 31.5 sec ds_read_b96 misaligned by 4: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 4: 3.8 sec ds_read_b32 * 3 misaligned by 4: 4.5 sec ds_read_b96 misaligned by 8: 10.9 sec ds_read_b32 + ds_read_b64 misaligned by 8: 12.8 sec ds_read_b32 * 3 misaligned by 8: 4.5 sec ``` Fixes: SWDEV-330802 Differential Revision: https://reviews.llvm.org/D123524
2022-04-12	[AMDGPU] Refactor LDS alignment checks.	Stanislav Mekhanoshin	3	-103/+77
	Move features/bugs checks into the single place allowsMisalignedMemoryAccessesImpl. This is mostly NFCI except for the order of selection in couple places. A separate change may be needed to stop lying about Fast. Differential Revision: https://reviews.llvm.org/D123343
2022-04-12	[X86] getFauxShuffleMask - remove use DemandedElts TODO	Simon Pilgrim	1	-1/+0
	Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask
2022-04-12	[ValueTracking] Make getStringLenth aware of strdup	serge-sans-paille	5	-26/+60
	During strlen compile-time evaluation, make it possible to track size of strduped strings. Differential Revision: https://reviews.llvm.org/D123497
2022-04-12	[AMDGPU][DOC][NFC] Updated GFX10 assembler syntax description	Dmitry Preobrazhensky	97	-2217/+2228
	The description has been updated to reflect AMDGPU MC changes: - enabled literals for src0 of v_fmaak_f, v_fmamk_f, v_madak_f32, v_madmk_f32; - enabled global_atomic_fcmpswap and global_atomic_fcmpswap_x2; - enabled dlc with flat_atomic* and global_atomic_*. Bug fixing and improvements: - enabled s_wait_idle; - enabled s_waitcnt_depctr; - added description of s_waitcnt_depctr syntactic sugar; - disabled SYSMSG_OP_HOST_TRAP_ACK (it is not supported on GFX10); - corrected description of lgkmcnt (accept values from 0 to 63).
2022-04-12	[AMDGPU][DOC][NFC] Updated GFX1030 assembler syntax description	Dmitry Preobrazhensky	16	-1039/+1062
	Summary of changes: - enabled null for VOP operands; - added description of s_waitcnt_depctr syntactic sugar.
2022-04-12	[DAG] Add non-uniform vector support to (shl (sr[la] exact X, C1), C2) folds	Simon Pilgrim	2	-60/+37

2022-04-12	[X86] Fix extact -> exact typo in test names	Simon Pilgrim	1	-16/+16

2022-04-12	[gn build] Port 95f0f69f1ff8	LLVM GN Syncbot	1	-1/+0

2022-04-12	[InlineCost] Check that function types match	Nikita Popov	2	-3/+30
	Retain the behavior we get without opaque pointers: A call to a known function with different function type is considered an indirect call. This fixes the crash reported in https://reviews.llvm.org/D123300#3444772.
2022-04-12	[gn build] Port 5a5be4044f0b	LLVM GN Syncbot	1	-0/+1

2022-04-12	workflow: When updating the issueXX branch, use force push	Tobias Hieta	1	-1/+1
	Otherwise if you try to update the branch with a new /cherry-pick from the same issue you will run into problems similar as to the one shown in this workflow: https://github.com/llvm/llvm-project/runs/5864672298?check_suite_focus=true Reviewed By: tstellar Differential Revision: https://reviews.llvm.org/D123365
2022-04-12	[llvm-pdbutil] Fix broken '-modi' option after change D122226.	Carlos Alberto Enciso	5	-6/+48
	The change described by: https://reviews.llvm.org/D122226 Moved some llvm-pdbutil functionality to the debug PDB library. This patch addresses a broken '-modi' argument handling, which causes an assertion if its value is other than '0' or '1'. In addition, it moves the assertion for the number of occurrences of the '-modi' argument from the PDB library into the llvm-pdbutil driver. Reviewed By: zequanwu Differential Revision: https://reviews.llvm.org/D123483
2022-04-12	[AMDGPU] Graceful abort for waterfalls in SIOptimizeVGPRLiveRange	Carl Ritson	2	-3/+129
	If the CFG structure of a waterfall loop is not the expected shape then gracefully abort traversing the IR for the given loop. This applies to nest waterfall loops which are not supported by the VGPR live range optimizer. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D123480
2022-04-12	[AMDGPU] Pre-commit test for D123569. NFC.	Carl Ritson	1	-0/+81

2022-04-12	[InstCombine] fold more constant remainder to select-of-constants remainder	Liqin Weng	2	-11/+8
	Reviewed By: xbolva00, spatel, Chenbing.Zheng Differential Revision: https://reviews.llvm.org/D123486
2022-04-12	[InstCombine] Fold icmp(X) ? f(X) : C	Alexander Shaposhnikov	2	-10/+38
	This diff extends foldSelectInstWithICmp to handle the case icmp(X) ? f(X) : C when f(X) is guaranteed to be equal to C for all X in the exact range of the inverse predicate. This addresses the issue https://github.com/llvm/llvm-project/issues/54089. Differential revision: https://reviews.llvm.org/D123159 Test plan: make check-all
2022-04-12	[InstCombine][NFC] Add baseline tests for folds icmp(X) ? f(X) : C	Alexander Shaposhnikov	1	-0/+65
	Differential revision: https://reviews.llvm.org/D123430 Test plan: make check-all
2022-04-11	[SelectionDAG] Remove unecessary null check after call to getNode. NFC	Craig Topper	1	-3/+2
	As far as I know getNode will never return a null SDValue. I'm guessing this was modeled after the FoldConstantArithmetic call earlier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123550
2022-04-11	GlobalISel: Verify atomic load/store ordering restriction	Matt Arsenault	4	-2/+28
	Reject acquire stores and release loads. This matches the restriction imposed by the LLParser and IR verifier.
2022-04-11	AArch64/GlobalISel: Regenerate mir test checks	Matt Arsenault	56	-4168/+4753
	Minimizes the test diffs in future changes from introduction of -NEXT.
2022-04-11	[gn build] Port 203a1e36ed75	LLVM GN Syncbot	1	-1/+0

2022-04-11	GlobalISel: Add memSizeNotByteSizePow2 legality helper	Matt Arsenault	5	-8/+30
	This is really a replacement for memSizeInBytesNotPow2 that actually does what most every target wants. In particular, since s1 rounds to 1 byte, it wasn't lowered by this predicate. This results in targets needing to think harder and add more matchers to catch all the degenerate cases. Also small bug fix that prevented the correct insertion of G_ASSERT_ZEXT in the AArch64 use case.
2022-04-11	GlobalISel: Implement computeKnownBits for overflow bool results	Matt Arsenault	20	-2549/+1465

2022-04-11	AMDGPU/GlobalISel: Add some additional IR tests for zextload	Matt Arsenault	1	-0/+115

2022-04-11	AMDGPU/GlobalISel: Add more tests for inreg extend + load combine	Matt Arsenault	2	-1/+354

2022-04-11	Mips/GlobalISel: Remove test IR sections and regenerate checks	Matt Arsenault	4	-169/+153

2022-04-11	AArch64/GlobalISel: Remove IR section from a test	Matt Arsenault	1	-75/+35

2022-04-11	AMDGPU/GlobalISel: Remove unused parameter	Matt Arsenault	3	-38/+30

2022-04-11	Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass"	Matt Arsenault	7	-108/+2
	This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d. The unrelated failure this exposed was fixed.
2022-04-11	AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5	Changpeng Fang	4	-99/+72
	Summary: In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force this alignment through the first implicit argument. In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used. Reviewers: arsenm, b-sumner Differential Revision: https://reviews.llvm.org/D123346
2022-04-11	[VFS] RedirectingFileSystem only replace path if not already mapped	Ben Barham	3	-12/+52
	If the `ExternalFS` has already remapped to an external path then `RedirectingFileSystem` should not change it to the originally provided path. This fixes the original path always being used if multiple VFS overlays were provided and the path wasn't found in the highest (ie. first in the chain). For now this is accomplished through the use of a new `ExposesExternalVFSPath` field on `vfs::Status`. This flag is true when the `Status` has an external path that's different from its virtual path, ie. the contained path is the external path. See the plan in `FileManager::getFileRef` for where this is going - eventually we won't need `IsVFSMapped` any more and all returned paths should be virtual. Resolves rdar://90578880 and llvm-project#53306. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D123398
2022-04-11	[CMake][gn][Bazel] Remove HAVE_PTHREAD_GETSPECIFIC	Fangrui Song	3	-7/+0
	The only user was removed by d351f54a076edf24c2a2bfda7cc7e3313ee3eecf.
2022-04-11	[RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of ↵	Craig Topper	6	-27/+35
	phis on RV64. Materializing constants on RISCV is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow RISCV to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122951
2022-04-11	[Support] Remove unused/uncompilable !HAVE_PTHREAD_GETSPECIFIC code path	Fangrui Song	1	-12/+0
	lib/Support/ThreadLocal.cpp has been uncompilable since rL158346 (2012-06) when `data` became a char array. The error looks like ``` ...llvm/lib/Support/Unix/ThreadLocal.inc:66:57: error: array type 'char[8]' is not assignable void ThreadLocalImpl::setInstance(const void* d) { data = const_cast<void*>(d);} ```
2022-04-11	Value::isTransitiveUsedByMetadataOnly: Don't repeatedly add an element to ↵	Fangrui Song	1	-7/+3
	the worklist. NFC
2022-04-11	[test] Remove references to -fexperimental-new-pass-manager in tests	Arthur Eubanks	12	-12/+12
	This has been the default for a while and we're in the process of removing the legacy PM optimization pipeline.
2022-04-11	AArch64 adding more tests to show the simple scenarios for or/and combine	Biplob Mishra	1	-0/+68

2022-04-11	[InstCombine] guard against splat-mul corner case	Sanjay Patel	2	-2/+13
	The test is already simplified, and I'm not sure how to write a test to exercise the new clause. But it protects the 2-bit pattern from miscompiling as noted in D123453. https://alive2.llvm.org/ce/z/QPyVfv (If we managed to fall into the mul transform, it would wrongly create a zero on this pattern.)
2022-04-11	[Driver] Simplify hasFlag pattern with addOptInFlag/addOptOutFlag helpers	Fangrui Song	2	-0/+17
	Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D123468
2022-04-11	AMDGPU/SDAG: Custom SETCC (i.e. ballot) is always uniform	Nicolai Hähnle	3	-26/+13
	The AMDGPUISD::SETCC node is like ISD::SETCC, but returns a lane mask instead of a per-lane boolean. The lane mask is uniform. This improves instruction selection for code patterns like ctpop(ballot(x)), which can now use an S_BCNT1_* instruction instead of V_BCNT_*. GlobalISel already selects scalar instructions (an earlier commit added a test case).. Differential Revision: https://reviews.llvm.org/D123432
2022-04-11	[LoopUnroll] Always respect user unroll pragma	Whitney Tsang	2	-41/+12
	IMO when user provide unroll pragma, compiler should always respect it. It is not clear to me why loop unroll pass currently ensure that the unrolled loop size is limited by PragmaUnrollThreshold. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119148