aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-20[libc][config] Add malloc as baremetal arm entrypoint (#95827)PiJoules2-0/+8
2024-06-20[NFC] fix incorrect #endif comment (#95991)Florian Mayer1-1/+1
2024-06-20[ubsan] Display correct runtime messages for negative _BitInt (#93612)earnol5-13/+305
Without this patch compiler-rt ubsan library has a bug displaying incorrect values for variables of the _BitInt (previously called _ExtInt) type. This patch affects affects both: generation of metadata inside code generator and runtime part. The runtime part provided only for i386 and x86_64 runtimes. Other runtimes should be updated to take full benefit of this patch. The patch is constructed the way to be backward compatible and int and float type runtime diagnostics should be unaffected for not yet updated runtimes. This patch fixes issue: https://github.com/llvm/llvm-project/issues/64100. Co-authored-by: Vladislav Aranov <vladislav.aranov@ericsson.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
2024-06-20[libc] Move freelist + block to __support (#96231)PiJoules12-96/+98
2024-06-20[WebAssembly] Re-enable reference types by default (#93261)Heejin Ahn3-6/+6
Now that we are about to upgrade emsdk's default node to v18.20.3 (https://github.com/emscripten-core/emsdk/pull/1387), we can re-enable reference-types by default again. This effectively reverts #90792.
2024-06-20[AMDGPU] Introduce a pseudo mnemonic for S_DELAY_ALU in MIR. (#96004)Michael Bedy5-2/+585
2024-06-20[DWIMPrint] Move the setting of the result status into dump_val_object (#96232)Adrian Prantl1-27/+31
Previously the result would get overwritten by a success on all code paths. This is another NFC change for TypeSystemClang, because an object description cannot actually fail there. It will have different behavior in the Swift plugin.
2024-06-20[VPlan] Include IV phi and backedge cost in VPlan cost computation.Florian Hahn4-22/+99
In WebAssembly, costs != 0 are assigned to be backedge and induction phis, so make sure we include those costs in the VPlan-based cost model. This fixes a downstream crash with WebAssembly after 242cc200ccb (https://github.com/llvm/llvm-project/pull/92555)
2024-06-20[vscode-mlir] Bump the version of braces to 3.0.3 (#96137)Stella Stamenova1-14/+14
Version 3.0.2 of braces has a security vulnerability.
2024-06-20[libc++] Remove <ostream> include from <chrono> (#96035)Nikolas Klauser4-3/+2
2024-06-20[clang] Define ptrauth_sign_constant builtin. (#93904)Ahmed Bougacha14-15/+395
This is a constant-expression equivalent to ptrauth_sign_unauthenticated. Its constant nature lets us guarantee a non-attackable sequence is generated, unlike ptrauth_sign_unauthenticated which we generally discourage using. It being a constant also allows its usage in global initializers, though requiring constant pointers and discriminators. The value must be a constant expression of pointer type which evaluates to a non-null pointer. The key must be a constant expression of type ptrauth_key. The extra data must be a constant expression of pointer or integer type; if an integer, it will be coerced to ptrauth_extra_data_t. The result will have the same type as the original value. This can be used in constant expressions. Co-authored-by: John McCall <rjmccall@apple.com>
2024-06-20[clang] Define ptrauth_string_discriminator builtin. (#93903)Ahmed Bougacha9-6/+109
This exposes the ABI-stable hash function that allows computing a 16-bit discriminator from a constant string. This allows manually matching the implicit string discriminators computed in the ABI (e.g., from mangled names for vtable pointer/entry signing), as well as enabling the use of interesting discriminators when manually annotating specific pointers with the __ptrauth qualifier. The argument must be a string literal of char character type. The result has type ptrauth_extra_data_t. The result value is never zero and always within range for both the __ptrauth qualifier and ptrauth_blend_discriminator. This can be used in constant expressions. Co-authored-by: John McCall <rjmccall@apple.com>
2024-06-20[clang][Interp] Try to fix #embed on big-endian machinesTimm Bäder1-5/+14
Insert a cast to the proper value.
2024-06-20Revert "[DebugInfo][BPF] Add 'annotations' field for DIBasicType & DI… ↵eddyz8711-295/+90
(#96172) …SubroutineType (#91422)" This reverts commit 3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c. As reported in [1,2] the commit above causes CI failure for powerpc-aix target. There is also a performance regression reported in [3]. Reverting to comply with the developer policy. [1] https://github.com/llvm/llvm-project/pull/91422#issuecomment-2179425473 [2] https://lab.llvm.org/buildbot/#/builders/64/builds/62 [3] https://github.com/llvm/llvm-project/pull/91422#issuecomment-2175631443
2024-06-20[OpenMP][libomp] Remove Perl in favor of Python (#95307)Jonathan Peyton20-5618/+987
* Removes all Perl scripts and modules * Adds Python3 scripts which mimic the behavior of the Perl scripts * Removes Perl from CMake; Adds Python3 requirement to CMake * The check-instruction-set.pl script is Knights Corner specific. The script is removed and not replicated with a corresponding Python3 script. Relevant Discourse: https://discourse.llvm.org/t/error-compiling-clang-with-offloading-support/79223/4 Fixes: https://github.com/llvm/llvm-project/issues/62289
2024-06-20[libc++abi] Use target_compile_options to pass ↵Louis Dionne2-1/+2
LIBCXXABI_ADDITIONAL_COMPILE_FLAGS (#96112) We use target_compile_options to pass the libc++ variant of this flag, so we should be consistent for libc++abi. This is actually not only a matter of consistency: target_compile_options handles duplicate CMake options in a certain way (it removes duplicates but has an escape hatch using the "SHELL:" prefix), and it is important for both libc++ and libc++abi options to be handled in the same way.
2024-06-20Reland "[ThinLTO] Populate declaration import status except for distributed ↵Mingming Liu9-85/+449
ThinLTO under a default-off new option" (#95482) Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of https://github.com/llvm/llvm-project/pull/92718 * `DenseMap` allocates space for a large number of key/value pairs and wastes space when the number of elements are small. * While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2] when the number of elements is small (for example, 3 or 4 elements). The programmer manual [3] also mentions it could waste space. * Experiments show `FunctionsToImportTy.size()` is smaller than 4 for multiple binaries with high indexing ram usage. `unordered_map` grows factor is at most 2 in llvm libc [4] for insert operations. With this change, `ComputeCrossModuleImport` ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass. [1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 [2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849 [3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h [4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526 **Original commit message** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](https://github.com/llvm/llvm-project/pull/87597#discussion_r1556067195)) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, https://github.com/llvm/llvm-project/pull/87600 will do this.
2024-06-20[BranchFolder] Fix missing debug info with tail merging (#94715)Alan Zhao4-16/+128
`BranchFolder::TryTailMergeBlocks(...)` removes unconditional branch instructions and then recreates them. However, this process loses debug source location information from the previous branch instruction, even if tail merging doesn't change IR. This patch preserves the debug information from the removed instruction and inserts them into the recreated instruction. Fixes #94050
2024-06-20[lldb] Make LanguageRuntime::GetTypeBitSize return an optional (NFC) (#96013)Jonas Devlieghere4-21/+22
Make LanguageRuntime::GetTypeBitSize return an optional. This should be NFC, though the ObjCLanguageRuntime implementation is (possibly) more defensive against returning 0. I'm not sure if it's possible for both `m_ivar.size` and `m_ivar.offset` to be zero. Previously, we'd return 0 and cache it, only to discard it the next time when finding it in the cache, and recomputing it again. The new code will avoid putting it in the cache in the first place.
2024-06-20[Clang] [Sema] Diagnose unknown std::initializer_list layout in SemaInit ↵Mital Ashok33-108/+195
(#95580) This checks if the layout of `std::initializer_list` is something Clang can handle much earlier and deduplicates the checks in CodeGen/CGExprAgg.cpp and AST/ExprConstant.cpp Also now diagnose `union initializer_list` (Fixes #95495), bit-field for the size (Fixes a crash that would happen during codegen if it were unnamed), base classes (that wouldn't be initialized) and polymorphic classes (whose vtable pointer wouldn't be initialized).
2024-06-20[IR] Remove RepeatedPass (#96211)Nikita Popov7-202/+3
This pass is not used in any pipeline, barely used in tests and not really useful, so drop it. The only place where we "repeat" passes is devirt repetition, and that is done using a separate pass.
2024-06-20[libc][math][c23] Add {getpayload,setpayload,setpayloadsig}f16 C23 math ↵OverMighty21-2/+519
functions (#95159) Part of #93566.
2024-06-20Reformat test (NFC)Adrian Prantl1-16/+6
2024-06-20Factor out expression result error strings.Adrian Prantl5-46/+68
2024-06-20Refactor GetObjectDescription() to return llvm::Expected (NFC)Adrian Prantl12-103/+143
This is de facto an NFC change for Objective-C but will benefit the Swift language plugin.
2024-06-20Convert ValueObject::Dump() to return llvm::Error() (NFCish)Adrian Prantl20-44/+108
This change by itself has no measurable effect on the LLDB testsuite. I'm making it in preparation for threading through more errors in the Swift language plugin.
2024-06-20[C99] Claim partial conformance to n448Aaron Ballman2-1/+54
This is the paper that added the 'restrict' keyword. Clang is conforming to the letter of the standard's requirements, so it would be defensible for us to claim full support instead. However, LLVM does not currently support the optimization semantics with restricted local variables or data members, only with restricted pointers declared in function parameters. So we're only claiming partial support because we don't yet take full advantage of what the feature allows.
2024-06-20Revert "[lldb][ObjC] Don't query objective-c runtime for decls in C++ ↵Michael Buch3-29/+1
contexts (#95963)" This reverts commit dadf960607bb429baebd3f523ce5b93260a154d2. The commit caused `TestEarlyProcessLaunch.py` to fail on the macOS bots.
2024-06-20[lld][WebAssembly] Handle stub symbol dependencies when an explicit import ↵Sam Clegg4-42/+60
name is used (#80169)
2024-06-20[lldb] Give more time to test/API/multiple-debuggersAdrian Prantl1-25/+30
This test occasionally fails on two of the busiest CI bots (asan and matrix), and we can't reproduce it locally. This leads to the hypothesis that the test is timing out (in the sense of the number of "join attempts" performed by this test's driver). This commit doubles the number of iterations performed and also does an NFC refactor of the main test loop so that it can be more easily understood.
2024-06-20[mlir][vector] Update tests for collapse 3/n (nfc) (#94906)Andrzej Warzyński2-44/+78
The main goal of this PR (and subsequent PRs), is to add more tests with scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir There's quite a few cases to consider, hence this is split into multiple PRs. In this PR, the very first test for `vector.transfer_write` is complemented with all the possible combinations: * scalable (rather than fixed) unit trailing dim, * dynamic (rather than static) trailing dim in the source memref. To this end, the following tests: * `@leading_scalable_dimension_transfer_write` `@trailing_scalable_one_dim_transfer_write` are replaced with: * `@drop_two_inner_most_dim_scalable_inner_dim` and `@negative_scalable_unit_dim`, respectively. In addition: * "_for_transfer_write" is removed from function names (to reduce noise). In addition, to maintain consistency between the tests for `xfer_read` and `xfer_write`, 2 negative tests for `xfer_read` are also renamed. This is to follow the suggestion made during the review of this PR. Extra comments in "VectorTransforms.cpp" are added to better document the limitations related to scalable vectors and which tests added here excercise. This is a follow-up for: #94490 and #94604 NOTE: This PR is limited to tests for `vector.transfer_write`.
2024-06-20Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"Florian Hahn8-28/+421
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92. Extra tests for crashes discovered when building Chromium have been added in fb86cb7ec157689e, 3be7312f81ad2. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-20[LV] Add tail folding test with scalarized store and wide header mask.Florian Hahn1-0/+225
Add additional test with salarized store which caused crashes with earlier versions of https://github.com/llvm/llvm-project/pull/92555.
2024-06-20[clang] Fix missing installed header (#95979)Daniel Otero1-1/+3
Since commit 8d468c132eed7ffe34d601b224220efd51655eb3, the header `openmp_wrappers/complex` is hidden behind `openmp_wrappers/complex.h` due to a bug in CMake[^1], so is not actually installed. To test the issue, you can ask `ninja` to generate the file on your build: ``` $ ninja lib/clang/19/include/openmp_wrappers/complex.h [199/199] Copying clang's openmp_wrappers/complex.h... $ ninja lib/clang/19/include/openmp_wrappers/complex ninja: error: unknown target 'lib/clang/19/include/openmp_wrappers/complex', did you mean 'lib/clang/19/include/openmp_wrappers/complex.h'? ``` Re-ordering the entries workarounds the issue. The other option is to revert the cited commit, but I'm not sure which approach is preferred. CC @etcwilde @jdoerfert [^1]: [Here](https://gitlab.kitware.com/cmake/cmake/-/issues/26058) is the CMake report on the issue.
2024-06-20[Clang][Comments] Support for parsing headers in Doxygen \par commands (#91100)hdoc8-11/+239
### Background Doxygen's `\par` command ([link](https://www.doxygen.nl/manual/commands.html#cmdpar)) has an optional argument, which denotes the header of the paragraph started by a given `\par` command. In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code: ``` $ cat test.cpp /// \par User defined paragraph: /// Contents of the paragraph. /// /// \par /// New paragraph under the same heading. /// /// \par /// A second paragraph. class A {}; $ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp `-CXXRecordDecl 0x1530f3a78 <test.cpp:11:1, col:10> col:7 class A definition |-FullComment 0x1530fea38 <line:2:4, line:9:23> | |-ParagraphComment 0x1530fe7e0 <line:2:4> | | `-TextComment 0x1530fe7b8 <col:4> Text=" " | |-BlockCommandComment 0x1530fe800 <col:5, line:3:30> Name="par" | | `-ParagraphComment 0x1530fe878 <line:2:9, line:3:30> | | |-TextComment 0x1530fe828 <line:2:9, col:32> Text=" User defined paragraph:" | | `-TextComment 0x1530fe848 <line:3:4, col:30> Text=" Contents of the paragraph." | |-ParagraphComment 0x1530fe8c0 <line:5:4> | | `-TextComment 0x1530fe898 <col:4> Text=" " | |-BlockCommandComment 0x1530fe8e0 <col:5, line:6:41> Name="par" | | `-ParagraphComment 0x1530fe930 <col:4, col:41> | | `-TextComment 0x1530fe908 <col:4, col:41> Text=" New paragraph under the same heading." | |-ParagraphComment 0x1530fe978 <line:8:4> | | `-TextComment 0x1530fe950 <col:4> Text=" " | `-BlockCommandComment 0x1530fe998 <col:5, line:9:23> Name="par" | `-ParagraphComment 0x1530fe9e8 <col:4, col:23> | `-TextComment 0x1530fe9c0 <col:4, col:23> Text=" A second paragraph." `-CXXRecordDecl 0x1530f3bb0 <line:11:1, col:7> col:7 implicit class A ``` As we can see above, the optional paragraph heading (`"User defined paragraph"`) is not an argument of the `\par` `BlockCommandComment`, but instead a child `TextComment`. For documentation generators like [hdoc](https://hdoc.io/), it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible. ### Change This change parses `\par` command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition: - AST unit tests are defined to test this functionality when an argument is present, isn't present, with additional spacing, etc. - TableGen is updated with an `IsParCommand` to support this functionality - `lit` tests are updated where needed
2024-06-20[AArch64] Consider runtime mode when deciding to use SVE for fixed-length ↵Sander de Smalen56-379/+441
vectors. (#96081) This also fixes the case where an SVE div is incorrectly to be assumed available in non-streaming mode with SME.
2024-06-20[PassManager] Remove some unnecessary includes (NFC) (#96175)Nikita Popov13-5/+16
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only needed for the UseNewDbgInfoFormat declare, which can be moved to the places that need it.
2024-06-20[llvm][AArch64] SVE2 is an optional feature in ARMv9.0a (#96007)Jon Roelofs4-14/+33
... so move it out of the `implied_features` list, and into the `DefaultExts` list.
2024-06-20[PPC] Add DwarfRegAlias for VSRPair (#95837)Zaara Syeda2-1/+2
Add DwarfRegAlias for VSRPair as it shares dwarfRegNum with the VR registers.
2024-06-20[GenericDomTreeConstruction] Use SmallVector (NFC) (#96138)Kazu Hirata1-1/+1
The use of SmallVector here saves 4.7% of heap allocations during the compilation of ConvertExpr.cpp.ii, a preprocessed version of flang/lib/Lower/ConvertExpr.cpp.
2024-06-20[libc][arm] implement a basic setjmp/longjmp (#93220)Nick Desaulniers (paternity leave)9-7/+178
Note: our baremetal arm configuration compiles this as `--target=arm-none-eabi`, so this code is built in -marm mode. It could be smaller with `--target=armv7-none-eabi -mthumb`. The assembler is valid ARMv5, or THUMB2, but not THUMB(1).
2024-06-20[mlir] Expose skipRegions option for Op printing in the C and Python ↵Jonas Rickert7-13/+51
bindings (#96150) The MLIR C and Python Bindings expose various methods from `mlir::OpPrintingFlags` . This PR adds a binding for the `skipRegions` method, which allows to skip the printing of Regions when printing Ops. It also exposes this option as parameter in the python `get_asm` and `print` methods
2024-06-20[RISCV][NFC] Cleanup SCR1 sched model (#96088)Anton Sidorenko1-2/+0
Related to https://github.com/llvm/llvm-project/pull/95948
2024-06-20[Clang][AMDGPU] Add a builtin for `llvm.amdgcn.make.buffer.rsrc` intrinsic ↵Shilei Tian6-0/+218
(#95276) Depends on https://github.com/llvm/llvm-project/pull/94830.
2024-06-20[Support] Vendor rpmalloc in-tree and use it for the Windows 64-bit release ↵Alexandre Ganea11-4/+5547
(#91862) ### Context We have a longstanding performance issue on Windows, where to this day, the default heap allocator is still lockfull. With the number of cores increasing, building and using LLVM with the default Windows heap allocator is sub-optimal. Notably, the ThinLTO link times with LLD are extremely long, and increase proportionally with the number of cores in the machine. In https://github.com/llvm/llvm-project/commit/a6a37a2fcd2a8048a75bd0d8280497ed89d73224, I introduced the ability build LLVM with several popular lock-free allocators. Downstream users however have to build their own toolchain with this option, and building an optimal toolchain is a bit tedious and long. Additionally, LLVM is now integrated into Visual Studio, which AFAIK re-distributes the vanilla LLVM binaries/installer. The point being that many users are impacted and might not be aware of this problem, or are unable to build a more optimal version of the toolchain. The symptom before this PR is that most of the CPU time goes to the kernel (darker blue) when linking with ThinLTO: ![16c_ryzen9_windows_heap](https://github.com/llvm/llvm-project/assets/37383324/86c3f6b9-6028-4c1a-ba60-a2fa3876fba7) With this PR, most time is spent in user space (light blue): ![16c_ryzen9_rpmalloc](https://github.com/llvm/llvm-project/assets/37383324/646b88f3-5b6d-485d-a2e4-15b520bdaf5b) On higher core count machines, before this PR, the CPU usage becomes pretty much flat because of contention: <img width="549" alt="VM_176_windows_heap" src="https://github.com/llvm/llvm-project/assets/37383324/f27d5800-ee02-496d-a4e7-88177e0727f0"> With this PR, similarily most CPU time is now used: <img width="549" alt="VM_176_with_rpmalloc" src="https://github.com/llvm/llvm-project/assets/37383324/7d4785dd-94a7-4f06-9b16-aaa4e2e505c8"> ### Changes in this PR The avenue I've taken here is to vendor/re-licence rpmalloc in-tree, and use it when building the Windows 64-bit release. Given the permissive rpmalloc licence, prior discussions with the LLVM foundation and @lattner suggested this vendoring. Rpmalloc's author (@mjansson) kindly agreed to ~~donate~~ re-licence the rpmalloc code in LLVM (please do correct me if I misinterpreted our past communications). I've chosen rpmalloc because it's small and gives the best value overall. The source code is only 4 .c files. Rpmalloc is statically replacing the weak CRT alloc symbols at link time, and has no dynamic patching like mimalloc. As an alternative, there were several unsuccessfull attempts made by Russell Gallop to use SCUDO in the past, please see thread in https://reviews.llvm.org/D86694. If later someone comes up with a PR of similar performance that uses SCUDO, we could then delete this vendored rpmalloc folder. I've added a new cmake flag `LLVM_ENABLE_RPMALLOC` which essentialy sets `LLVM_INTEGRATED_CRT_ALLOC` to the in-tree rpmalloc source. ### Performance The most obvious test is profling a ThinLTO linking step with LLD. I've used a Clang compilation as a testbed, ie. ``` set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 -Xclang -O3 -fstrict-aliasing -march=native -flto=thin -fwhole-program-vtables -fuse-ld=lld cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%" -DLLVM_ENABLE_LTO=THIN ``` I've profiled the linking step with no LTO cache, with Powershell, such as: ``` Measure-Command { lld-link /nologo @CMakeFiles\clang.rsp /out:bin\clang.exe /implib:lib\clang.lib /pdb:bin\clang.pdb /version:0.0 /machine:x64 /STACK:10000000 /DEBUG /OPT:REF /OPT:ICF /INCREMENTAL:NO /subsystem:console /MANIFEST:EMBED,ID=1 }` ``` Timings: | Machine | Allocator | Time to link | |--------|--------|--------| | 16c/32t AMD Ryzen 9 5950X | Windows Heap | 10 min 38 sec | | | **Rpmalloc** | **4 min 11 sec** | | 32c/64t AMD Ryzen Threadripper PRO 3975WX | Windows Heap | 23 min 29 sec | | | **Rpmalloc** | **2 min 11 sec** | | | **Rpmalloc + /threads:64** | **1 min 50 sec** | | 176 vCPU (2 socket) Intel Xeon Platinium 8481C (fixed clock 2.7 GHz) | Windows Heap | 43 min 40 sec | | | **Rpmalloc** | **1 min 45 sec** | This also improves the overall performance when building with clang-cl. I've profiled a regular compilation of clang itself, ie: ``` set OPTS=/GS- /D_ITERATOR_DEBUG_LEVEL=0 /arch:AVX -fuse-ld=lld cmake -G Ninja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=TRUE -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_PDB=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_C_COMPILER=clang-cl.exe -DCMAKE_CXX_COMPILER=clang-cl.exe -DCMAKE_LINKER=lld-link.exe -DLLVM_ENABLE_LLD=ON -DCMAKE_CXX_FLAGS="%OPTS%" -DCMAKE_C_FLAGS="%OPTS%" ``` This saves approx. 30 sec when building on the Threadripper PRO 3975WX: ``` (default Windows Heap) C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2" Benchmark 1: ninja clang -C stage1_test2 Time (mean ± σ): 392.716 s ± 3.830 s [User: 17734.025 s, System: 1078.674 s] Range (min … max): 390.127 s … 399.449 s 5 runs (rpmalloc) C:\src\git\llvm-project>hyperfine -r 5 -p "make_llvm.bat stage1_test2" "ninja clang -C stage1_test2" Benchmark 1: ninja clang -C stage1_test2 Time (mean ± σ): 360.824 s ± 1.162 s [User: 15148.637 s, System: 905.175 s] Range (min … max): 359.208 s … 362.288 s 5 runs ```
2024-06-20[clang] Move 'alpha.cplusplus.MisusedMovedObject' to 'cplusplus.Move' in ↵Balázs Kéri1-19/+19
documentation (NFC) (#95003) The checker was renamed at some time ago but the documentation was not updated. The section is now just moved and renamed. The documentation is still very simple and needs improvement.
2024-06-20[RISCV] Strength reduce mul by 2^N - 2^M (#88983)Philip Reames14-374/+422
This is a three instruction expansion, and does not depend on zba, so most of the test changes are in base RV32/64I configurations. With zba, this gets immediates such as 14, 28, 30, 56, 60, 62.. which aren't covered by our other expansions.
2024-06-20[clang][Interp] Nested ThisExprs that don't refer to the frame this ptrTimm Bäder6-18/+93
Use a series of ops in that case, getting us to the right declaration field.
2024-06-20[SPIRV] Add trig function lowering (#95973)Farzon Lotfi13-3/+386
This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This is part 2 of 4 PRs. It sets the ground work for adding the intrinsics. Add SPIRV Lower for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 There isn't any aarch64 change in this pr, but when you add a target opcode it is visible in there validaiton tests.
2024-06-20[AArch64][TargetParser] Split FMV and extensions (#92882)Tomas Matheson14-256/+283
FMV extensions are really just mappings from FMV feature names to lists of backend features for codegen. Split them out into their own separate file.