rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-12-18	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185)upstream/users/clementval/acc_device_type_none	James Y Knight	3	-16/+28
	This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.
2023-12-18	[X86AsmParser] Check displacement overflow (#75747)	Fangrui Song	5	-32/+99
	A displacement is an 8-, 16-, or 32-bit value. LLVM integrated assembler silently encodes an out-of-range displacement. GNU assembler checks the displacement and may report a warning or error (error is for 64-bit addressing, done as part of https://sourceware.org/PR10636). ``` movq 0x80000000(%rip), %rax Error: 0x80000000 out of range of signed 32bit displacement movq -0x080000001(%rax), %rax Error: 0xffffffff7fffffff out of range of signed 32bit displacement movl 0x100000001(%eax), %eax Warning: 0x100000001 shortened to 0x1 ``` For 32-bit addressing, GNU assembler gives no diagnostic when the displacement is within `[-232,232)`. 16-bit addressing is similar. ``` movl 0xffffffff(%eax), %eax # no diagnostic movl -0xffffffff(%eax), %eax # no diagnostic ``` Supporting a larger range is probably because wraparound using a large constant is more reasonable. E.g. Linux kernel arch/x86/kernel/head_32.S has `leal -__PAGE_OFFSET(%ecx),%esp` where `__PAGE_OFFSET` is 0xc0000000. This patch implements a similar behavior.
2023-12-18	asan_static x86-64: Support 64-bit ASAN_SHADOW_OFFSET_CONST (#75748)	Fangrui Song	1	-0/+5
	Fix #57086: when ASAN_SHADOW_OFFSET_CONST >= 0x80000000 (FreeBSD, NetBSD, etc), `movsbl ASAN_SHADOW_OFFSET_CONST(%r10),%r10d` has an invalid displacement (not representable as a signed 32-bit integer), which will be diagnosed by GNU assembler. ``` % cat a.s movsbl 0x80000000(%r10),%r10d % as a.s a.s: Assembler messages: a.s:1: Error: 0x80000000 out of range of signed 32bit displacement % clang -c a.s ``` The integrated assembler after #75747 will diagnose the invalid displacement as well. ``` % clang -c a.s a.s:1:19: error: displacement 2147483648 is not within [-2147483648, 2147483647] movsbl 0x80000000(%r10),%r10d ^ ``` If ASAN_SHADOW_OFFSET_CONST cannot be encoded as a displacement, switch to `movabsq+movsbl`.
2023-12-18	[TSAN] add instrumentation for pthread_mutex_clocklock (#75713)	Yvan	2	-0/+44
	The function `pthread_mutex_clocklock` is not supported by TSAN yet, which is mentioned by[ llvm/llvm-project/issues/62623](https://github.com/llvm/llvm-project/issues/62623#issue-1701600538). This patch is to handle this function.
2023-12-18	[TableGen] AsmParser: Keep consistent naming. NFC	Michael Liao	1	-1/+1

2023-12-18	[clang][fatlto] Don't set ThinLTO module flag with FatLTO (#75079)	Paul Kirth	2	-6/+2
	Since FatLTO now uses the UnifiedLTO pipeline, we should not set the ThinLTO module flag to true, since it may cause an assertion failure. See https://github.com/llvm/llvm-project/issues/70703 for context.
2023-12-18	[DWP] Fix default for continue-on-cu-index-overflow (#75540)	Alexander Yermolovich	3	-10/+29
	This is follow up for https://github.com/llvm/llvm-project/pull/71902. The default option --continue-on-cu-index-overflow returned an error --continue-on-cu-index-overflow: missing argument. Changed it so that it is the same behavior as other flags like -gsplit-dwarf. Where --continue-on-cu-index-overflow will default to continue, and user can set mode with --continue-on-cu-index-overflow=\<value>.
2023-12-18	[libc++] Fix the handling of `views::take` for `iota_view` (#75683)	A. Jiang	2	-16/+23
	Currently, when libc++'s views::take specially handles an iota_view, the addition is done after dereferencing the beginning iterator. However, in [range.take.overview]/2.3, the addition is done before the dereferencing, which means that the standard requires the returned iota_view to have the same W and Bound type in such cases. This patch fixes that, and also fixes a test that was testing the incorrect behavior. Fixes #75611
2023-12-18	Revert "[SLP][NFC]Check for equal opcode preliminary to meet weak strict order"	Alexey Bataev	1	-2/+0
	This reverts commit 58a2c4e2f24ffce3966c3988d1a4ca7b04c52244 to fix the issue detected by https://lab.llvm.org/buildbot/#/builders/233/builds/5424.
2023-12-18	[sanitizer] [Darwin] Disable InstallAtForkHandler	Azharuddin Mohammed	3	-2/+3
	This is a followup to d01be3c63109986627c1c029d6d0130f76a63a2f.
2023-12-18	[libc] expose aux vector (#75806)	Schrodinger ZHU Yifan	4	-28/+26
	This patch lifts aux vector related definitions to app.h. Because startup's refactoring is in progress, this patch still contains duplicated changes. This problem will be addressed very soon in an incoming patch.
2023-12-18	[lldb] Fix a quirk in SBValue::GetDescription (#75793)	Pavel Labath	4	-3/+41
	The function was using the default version of ValueObject::Dump, which has a default of using the synthetic-ness of the top-level value for determining whether to print _all_ values as synthetic. This resulted in some unusual behavior, where e.g. a std::vector is stringified as synthetic if its dumped as the top level object, but in its raw form if it is a member of a struct without a pretty printer. The SBValue class already has properties which determine whether one should be looking at the synthetic view of the object (and also whether to use dynamic types), so it seems more natural to use that.
2023-12-18	[bazel] Port a0a3c793d212ffc70fdba4c94b024114d11532af	James Y Knight	1	-0/+15

2023-12-18	[clang][lex] Fix non-portability diagnostics with absolute path (#74782)	Jan Svoboda	2	-7/+28
	The existing code incorrectly assumes that `Path` can be empty. It can't, it always contains at least `<` or `"`. On Unix, this patch fixes an incorrect diagnostics that instead of `"/Users/blah"` suggested `"Userss/blah"`. In assert builds, this would outright crash. This patch also fixes a bug on Windows that would prevent the diagnostic being triggered due to separator mismatch. rdar://91172342
2023-12-18	[RISCV][GISel] Fix a bug exposed from compilation warnings. NFC	Michael Liao	1	-5/+5
	- G_MERGE_VALUES and G_UNMERGE_VALUES need type pairs instead of type.
2023-12-18	[InstrRef][NFC] Delete unused variables (#75501)	Felipe de Azevedo Piovezan	1	-9/+2
	`V` was unused, and all the other deletions follow from that observation.
2023-12-18	[libc++] Add libc++ clang-formatting commit to git-blame-ignore-revs file	Louis Dionne	1	-0/+3

2023-12-18	[libc++] Format the code base (#74334)	Louis Dionne	542	-84859/+67513
	This patch runs clang-format on all of libcxx/include and libcxx/src, in accordance with the RFC discussed at [1]. Follow-up patches will format the benchmarks, the test suite and remaining parts of the code. I'm splitting this one into its own patch so the diff is a bit easier to review. This patch was generated with: find libcxx/include libcxx/src -type f \ \| grep -v 'module.modulemap.in' \ \| grep -v 'CMakeLists.txt' \ \| grep -v 'README.txt' \ \| grep -v 'libcxx.imp' \ \| grep -v '__config_site.in' \ \| xargs clang-format -i A Git merge driver is available in libcxx/utils/clang-format-merge-driver.sh to help resolve merge and rebase issues across these formatting changes. [1]: https://discourse.llvm.org/t/rfc-clang-formatting-all-of-libc-once-and-for-all
2023-12-18	[AMDGPU] Produce better memoperand for LDS DMA (#75247)	Stanislav Mekhanoshin	1	-6/+10
	1) It was marked as volatile. This is not needed and the only reason it was done is because it is both load and store and handled together with atomics. Global load to LDS was marked as volatile just because buffer load was done that way. 2) Preserve at least LDS (store) pointer which we always have with the intrinsics. 3) Use PoisonValue instead of nullptr for load memop as a Value.
2023-12-18	[AMDGPU] Fix lack of LDS DMA check in the AA handling (#75249)	Stanislav Mekhanoshin	1	-0/+3
	SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks, but does not account for LDS DMA instructions. Added these checks. Without it code falls through and returns true which is wrong. As a result mayAlias would always return false for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions. At the moment this is NFCI because we do not use this AA in a context which may touch LDS DMA instructions. This is also unreacheable now because of the ordered memory ref checks just above in the function and LDS DMA is marked as volatile. This volatile marking is removed in PR #75247, therefore I'd submit this check before #75247.
2023-12-18	[HLSL][DirectX] Move handling of resource element types into the frontend	Justin Bogner	11	-138/+204
	Rather than shepherding a type name all the way to the backend as a string and attempting to parse it, get the element type out of the AST and store that in the resource annotation metadata directly. Pull Request: https://github.com/llvm/llvm-project/pull/75674
2023-12-18	[-Wunsafe-buffer-usage] Add a subgroup `-Wunsafe-buffer-usage-in-container` ↵	Ziqing Luo	1	-1/+2
	(#75665) Add a sub diagnostic group under `-Wunsafe-buffer-usage` controlled by `-Wunsafe-buffer-usage-in-container`. The subgroup will include warnings on misuses of `std::span`, `std::vector`, and `std::array`.
2023-12-18	[gn build] Manually port 945c645a and a0a3c793	Arthur Eubanks	2	-0/+18

2023-12-18	[libc] Improve get_object_files_for_test to reduce CMake configure time for ↵	lntue	1	-30/+49
	tests. (#75552) Profiling cmake shows that a significant time configuring `libc` folder is spent on running `get_object_files_for_test` in the `test` folder (13 sec in `libc/test` folder / 16 sec in `libc` folder). By caching all needed objects for each target instead of resolving every time, the time cmake spends on configuring `libc/test` folder is reduced to ~1s.
2023-12-18	[MLIR][Linalg] Support dynamic sizes in `lower_unpack` (#75494)	srcarroll	2	-9/+132

2023-12-18	[llvm-objdump] --disassemble-symbols: skip inline relocs from symbols that ↵	Fangrui Song	4	-9/+14
	are not dumped (#75724) When a section contains two functions x1 and x2, we incorrectly display x1's relocations when dumping x2 for `--disassemble-symbols=x2 -r`. Fix #75539 by ignoring these relocations.
2023-12-18	[LTO] Improve diagnostics handling when parsing module-level inline assembly ↵	Fangrui Song	5	-6/+26
	(#75726) Non-LTO compiles set the buffer name to "<inline asm>" (`AsmPrinter::addInlineAsmDiagBuffer`) and pass diagnostics to `ClangDiagnosticHandler` (through the `MCContext` handler in `MachineModuleInfoWrapperPass::doInitialization`) to ensure that the exit code is 1 in the presence of errors. In contrast, LTO compiles spuriously succeed even if error messages are printed. ``` % cat a.c void _start() {} asm("unknown instruction"); % clang -c a.c <inline asm>:1:1: error: invalid instruction mnemonic 'unknown' 1 \| unknown instruction \| ^ 1 error generated. % clang -c -flto a.c; echo $? # -flto=thin is the same error: invalid instruction mnemonic 'unknown' unknown instruction ^~~~~~~ error: invalid instruction mnemonic 'unknown' unknown instruction ^~~~~~~ 0 ``` `CollectAsmSymbols` parses inline assembly and is transitively called by both `ModuleSummaryIndexAnalysis::run` and `WriteBitcodeToFile`, leading to duplicate diagnostics. This patch updates `CollectAsmSymbols` to be similar to non-LTO compiles. ``` % clang -c -flto=thin a.c; echo $? <inline asm>:1:1: error: invalid instruction mnemonic 'unknown' 1 \| unknown instruction \| ^ 1 errors generated. 1 ``` The `HasErrors` check does not prevent duplicate warnings but assembler warnings are very uncommon.
2023-12-18	[mlir][memref] Make `LoadOp::verify` error more clear (#75831)	Rik Huijzer	2	-2/+13
	While debugging https://github.com/llvm/llvm-project/issues/71326, the `LoadOp::verify` code and error were very confusing. This PR improves that. This code was a part from the reverted PR https://github.com/llvm/llvm-project/pull/75519. Fixing the `-convert-vector-to-scf` issue is going to take a bit longer and this code was out of scope anyway. Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2023-12-18	Revert "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use ↵	Mingming Liu	17	-363/+128
	semicolon as delimiter for local-linkage varibles." (#75835) Reverts llvm/llvm-project#74008 The compiler-rt test failed due to `llvm-dis` not found (https://lab.llvm.org/buildbot/#/builders/127/builds/59884) Will revert and investigate how to require the proper dependency.
2023-12-18	[OpenMP] Directly use user's grid and block size in kernel language mode ↵	Shilei Tian	3	-0/+49
	(#70612) In kernel language mode, use user's grid and blocks size directly. No validity check, which means if user's values are too large, the launch will fail, similar to what CUDA and HIP are doing right now.
2023-12-18	[LinkerWrapper] Forward more arguments to the CPU offloading linker (#75757)	Joseph Huber	2	-7/+29
	Summary: The CPU target currently inherits all the libraries from the normal link job to ensure that it has access to the same envrionment that the host does. However, this previously was not respecting argument libraries that are passed by name rather than `-l` as well as the whole archive flags. This patch fixes this to allow the CPU linker to correctly pick up the libraries associated with things like address sanitizers. Fixes: https://github.com/llvm/llvm-project/issues/75651
2023-12-18	[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon ↵	Mingming Liu	17	-128/+363
	as delimiter for local-linkage varibles. (#74008) Commit fe05193 (phab D156569), IRPGO names uses format `[<filepath>;]<linkage-name>` while prior format is `[<filepath>:<mangled-name>`. The format change would break the use case demonstrated in (updated) `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` This patch changes `GlobalValues::getGlobalIdentifer` to use the semicolon. To elaborate on the scenario how things break without this PR 1. IRPGO raw profiles stores (compressed) IRPGO names of functions in one section, and per-function profile data in another section. The [NameRef](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/compiler-rt/include/profile/InstrProfData.inc#L72) field in per-function profile data is the MD5 hash of IRPGO names. 2. When raw profiles are converted to indexed format profiles, the profiled address is [mapped](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/ProfileData/InstrProf.cpp#L876-L885) to the MD5 hash of the callee. 3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names will be [annotated](https://github.com/llvm/llvm-project/blob/fc715e4cd942612a091097339841733757b53824/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L1707) as value profiles, and used to import indirect-call-prom candidates. If the annotated MD5 hash is computed from the new format while import uses the prior format, the callee cannot be imported. `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` is added to have an end-to-end test. `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` is updated to have better test coverage from another aspect (as runtime tests are more sensitive to the environment and may be skipped by some contributors)
2023-12-18	Revert "[InstCombine] Favour `m_Poison` in `SimplifyDemandedVectorElts`"	Nikita Popov	18	-143/+131
	This reverts commit 318d5bff0b65aa7d52fc7004d49587416f0fb564. Has incomplete test updates.
2023-12-18	[Libomptarget] Remove remaining global constructors in plugins (#75814)	Joseph Huber	1	-11/+9
	Summary: This patch fixes the remaining global constructor in the plguins after addressing the ones in the JIT interface. This struct was mistakenly using global constructors as not all the members were being initialized properly. This was almost certainly being optimized out because it's trivial, but would still be present in debug builds and prevented us from compiling with `-Werror=global-constructors`. We will want to do that once offloading is moved to a runtimes only build.
2023-12-18	[AArch64][SME2] Enable bfm builtins for sme2 (#71927)	Sam Tebbs	3	-10/+28
	This patch enables the following builtins for SME2 svbfmlslb_f32 svbfmlslb_lane_f32 svbfmlslt_f32 svbfmlslt_lane_f32 Patch by: Kerry McLaughlin <kerry.mclaughlin@arm.com> --------- Co-authored-by: Matthew Devereau <matthew.devereau@arm.com>
2023-12-18	[Clang][SVE2.1] Update names of the `svwhileXX` builtins with ↵	Momchil Velikov	3	-125/+135
	predicate-as-counter (#75200) The `_s64`/`_u64` part can be omitted now and the name variants do not include unsigned comparison mnemonics. Both are inferred from the argument types.
2023-12-18	[libc++][modules] Adds CMake 3.28 support. (#75700)	Mark de Wever	1	-4/+8
	This is a preparation to start using CMake 3.28 in the CI.
2023-12-18	[Clang][SME2] Add multi-vector zip & unzip builtins (#74841)	Kerry McLaughlin	5	-0/+3195
	Adds the following SME2 builtins: - svzip (x2 & x4) - svzipq (x2 & x4) - svuzp (x2 & x4) - svuzpq (x2 & x4) See https://github.com/ARM-software/acle/pull/217/files Patch by David Sherwood <david.sherwood@arm.com>
2023-12-18	[InstCombine] Favour `m_Poison` in `SimplifyDemandedVectorElts`	Antonio Frighetto	18	-131/+143
	A miscompilation issue has been addressed with refined checking.
2023-12-18	[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229)	Simon Pilgrim	4	-262/+94
	We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads). reduceLoadWidth can handle this for scalar loads, but not for vectors. Noticed while triaging D152928
2023-12-18	[Clang][SME] Warn when a function doesn't have ZA state (#75805)	Sam Tebbs	24	-297/+371
	This patch adds a warning that's emitted when a builtin call uses ZA state but the calling function doesn't provide any. Patch by David Sherwood <david.sherwood@arm.com>.
2023-12-18	[mlir] fix filecheck prefixes in a dataflow test (#75794)	Oleksandr "Alex" Zinenko	1	-55/+55
	-SAME and -LITERAL do not compose in CHECK commands.
2023-12-18	aarch64: fix testcase (#75723)	Nathan Sidwell	1	-1/+1
	Add missing < %s to RUN line.
2023-12-18	[InstCombine] Match poison instead of undef in foldVectorBinop()	Nikita Popov	3	-109/+81
	Some negative tests turn into positive tests, as the differences between undef and poison propagation allow additional transforms.
2023-12-18	[Clang][SVE2.1] Add floating-point variants of `svrevd_XX` (#75117)	Momchil Velikov	4	-4/+245

2023-12-18	[InstCombine] Match poison instead of undef in binop of same-mask shuffle fold	Nikita Popov	1	-2/+2

2023-12-18	[SystemZ][z/OS] Add guard for dl_info and dladdr (#75637)	Abhina Sree	1	-10/+18
	This patch fixes the following build error on z/OS `error: unknown type name 'Dl_info'` by adding a guard to check if we have dladdr.
2023-12-18	[InstCombine] Explicitly canonicalize splat shuffles to use poison RHS	Nikita Popov	4	-4/+9
	This is usually handled by demanded elements simplification. However, as that is not supported for scalable vectors, also handle it explicitly here.
2023-12-18	[OpenMP][Clang] Force use of `num_teams` and `thread_limit` for bare kernel ↵	Shilei Tian	6	-104/+130
	(#68373) This patch makes `num_teams` and `thread_limit` mandatory for bare kernels, similar to a reguar kernel language that when launching a kernel, the grid size has to be set explicitly.
2023-12-18	[Libomptarget] Remove bitcode image map used for JIT processing (#75672)	Joseph Huber	1	-32/+19
	Summary: Libomptarget supports JIT by treating an LLVM-IR file as a regular input image. The handling here used a global map to keep track of triples once it was parsed. This was done to same time, however this created a global constructor as well as an extra mutex to handle it. This patch removes the use of this map. Instead, we simply use the file magic to perform a quick check if the input image is valid bitcode. If not, we then create a lazy module. This should roughly equivalent to the old handling that create an IR symbol table. Here we can prevent the module from materializing everything but the single triple metadata we read in later.