aboutsummaryrefslogtreecommitdiff
path: root/llvm
AgeCommit message (Collapse)AuthorFilesLines
2 hours[GitHub][CI] Add clang-tidy premerge workflow (#154829)HEADmainBaranov Victor3-0/+654
**KEY POINTS** - MVP workflow to automatically lint C++ files, located **only** in `clang-tools-extra/clang-tidy`. It's chosen this way as `clang-tools-extra/clang-tidy` is almost 100% `clang-tidy` complaint, thus we would automatically enforce a [high quality standard for clang-tidy source code](https://discourse.llvm.org/t/rfc-create-hardened-clang-tidy-config-for-clang-tidy-directory/87247). (https://github.com/llvm/llvm-project/pull/147793) - Implementation is very similar to code-format job, but without the ability to run `code-lint-helper.py` locally. **FOUND ISSUES** + open questions - Speed: it takes ~ 1m40sec to download and unpack `clang-tidy` plus additional ~4 mins to configure and CodeGen targets. I see that `premerge.yaml` runs on special `llvm-premerge-linux-runners` runners which can use `sccache` for speed. Can we use the same runners for this job? Exact timings can be found [here](https://github.com/llvm/llvm-project/actions/runs/17135749067/job/48611150680?pr=154223). **TO DO** - Place common components from `code-lint-helper.py` and `code-format-helper.py` into a separate file and reuse it in both CI's. - Compute CodeGen targets based on `projects_to_build`, for now `clang-tablegen-targets` is hardcoded for `clang-tools-extra/`. - Automatically generate and upload `.yaml` for `clang-apply-replacements` - Create an RFC with a plan how to enable `clang-tidy` in other projects so that Maintainers of LLVM components could choose if they want `clang-tidy` or not. - Add more linters like `pylint`, `ruff` in the future.
2 hours[LV] Add additional tests for scalar load costs of addresses.Florian Hahn1-0/+106
3 hours[LV] Regenerate check lines without dce/instcombine.Florian Hahn1-1409/+1975
Remove dce,instcombine from run lines from test to make it easier to check the output generated by LV.
4 hours[SCEV] Add tests that benefit from rewriting SCEVAddExpr with guards.Florian Hahn4-0/+245
Add additional tests benefiting from rewriting existing SCEVAddExprs with guards.
5 hours[Object] Add a missing space to a diagnostic (#159826)Nico Weber1-1/+1
Follow-up to https://reviews.llvm.org/D46527
6 hours[ADT] Use a C++17 fold expression in hash_combine (NFC) (#159901)Kazu Hirata1-14/+9
combine() combines hash values with recursion on variadic parameters. This patch replaces the recursion with a C++17 fold expression: (combine_data(length, buffer_ptr, buffer_end, get_hashable_data(args)), ...); which expands to: combine_data(length, buffer_ptr, buffer_end, get_hashable_data(a)); combine_data(length, buffer_ptr, buffer_end, get_hashable_data(b)); combine_data(length, buffer_ptr, buffer_end, get_hashable_data(c)); : A key benefit of this change is the unification of the recursive step and the base case. The argument processing and finalization logic now exist as straight-line code within a single function. combine_data now takes buffer_ptr by reference. This is necessary because the previous assignment pattern: buffer_ptr = combine_data(...) is syntactically incompatible with a fold expression. The new pattern: (combine_data(...), ...) discards return values, so combine_data must update buffer_ptr directly. For readability, this patch does the bare minimum to use a fold expression, leaving further cleanups to subsequent patches. For example, buffer_ptr and buffer_end could become member variables, and several comments that mention recursion still need updating.
7 hours[llvm] Proofread GettingStarted.rst (#159904)Kazu Hirata1-14/+14
7 hours[IR] Simplify dispatchRecalculateHash and dispatchResetHash (NFC) (#159903)Kazu Hirata2-16/+10
This patch simplifies dispatchRecalculateHash and dispatchResetHash with "constexpr if". This patch does not inline dispatchRecalculateHash and dispatchResetHash into their respective call sites. Using "constexpr if" in a non-template context like MDNode::uniquify would still require the discarded branch to be syntactically valid, causing a compilation error for node types that do not have recalculateHash/setHash. Using template functions ensures that the "constexpr if" is evaluated in a proper template context, allowing the compiler to fully discard the inactive branch.
7 hours[IR] Modernize HasCachedHash (NFC) (#159902)Kazu Hirata1-7/+3
This patch modernizes HasCachedHash. - "struct SFINAE" is replaced with identically defined SameType. - The return types Yes and No are replaced with std::true_type and std::false_type. My previous attempt (#159510) to clean up HasCachedHash failed on clang++-18, but this version works with clang++-18.
7 hours[ADT] Move IsSizeLessThanThresholdT into AdjustedParamTBase (NFC) (#159900)Kazu Hirata1-7/+3
This patch moves IsSizeLessThanThresholdT into AdjustedParamTBase, the sole user of the helper, while switching to a type alias. Aside from moving the helper closer to where it's used, another benefit is that we can assume that T is a complete type inside AdjustedParamTBase. Note that sizeof(T) serves as a check for a complete type. Inside AdjustedParamTBase, we only pass complete non-void types to: std::is_trivially_copy_constructible<T> std::is_trivially_move_constructible<T> so we can safely drop the fallback case implemented with std::false_type.
8 hours[LLVM] Remove leftover unnecessary CMake for GPU runtimesJoseph Huber1-8/+0
Summary: This somehow snuck back in.
9 hours[MemProf] Propagate function call assignments to newly cloned nodes (#159907)Teresa Johnson2-12/+85
There are a couple of places during function cloning where we may create new callsite clone nodes. One of those places was correctly propagating the assignment to which function clone it should call, and one was not. Refactor this handling into a helper and use in both places so the newly created callsite clones actually call the assigned callee function clones.
9 hours[llvm][test][CGPluginTest] Fix plugin path again (#159923)Raul Tambre1-1/+1
I forgot to remove a bunch of the intermediary path. That's what I get for not waiting my local build to finish. Fixes: 47c1b650626043f0a8f8e32851617201751f9439
9 hours[gn] port bf835169a52b7Nico Weber1-4/+1
9 hours[gn] port 60bdf0965441Nico Weber1-0/+2
10 hours[InstCombine] Generalise optimisation of redundant floating point ↵Rajveer Singh Bharadwaj2-32/+44
comparisons with `ConstantFPRange` (#159315) Follow up of #158097 Similar to `simplifyAndOrOfICmpsWithConstants`, we can do so for floating point comparisons.
11 hours[OpenMP] Allow Fortran tests (#150722)Michael Kruse1-0/+4
In addition to existing C/C++ tests, add Fortran-based tests. Fortran tests will only run if a Fortran compiler is found. The first test is for the unroll construct added in #144785.
12 hours[llvm][test][CGPluginTest] Fix plugin path (#159914)Raul Tambre1-1/+1
During development I introduced the `%llvm_obj_root` substitution but later removed it as a better solution became apparent. Revert this to the original substitution while keeping the new path. Fixes: 4e1c996674cc340f290b0a528e2038e76494d8d4
16 hours[ValueTracking] a - b == NonZero -> a != b (#159792)Yingwei Zheng2-1/+207
Alive2: https://alive2.llvm.org/ce/z/8rX5Rk Closes https://github.com/llvm/llvm-project/issues/118106.
17 hours[RISCV] Fix typo in comment. NFCCraig Topper1-1/+1
18 hours[lit] Add support for deleting symlinks to directories without -rAiden Grossman4-1/+24
Before this change, rm would assume that a symlink to a directory was actually a directory and require the recursive flag to be passed, differing from other shells. Given the change in lit is about the same length as the test change would be (minus tests), I think it makes sense to just support this in the internal shell. Reviewers: cmtice, petrhosek, ilovepi Reviewed By: petrhosek, cmtice, ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/158464
18 hours[lit] Make builtin cat work with stdinAiden Grossman2-0/+7
cat with no files passed to it is supposed to read from STDIN according to POSIX. The builtin cat lacking this behavior led to the clang test in dev-fd-fs.c to fail because it expected this behavior. This is a simple modification and I do not think it is possible to rewrite the test without this easily while preserving the semantics around named pipes. Reviewers: petrhosek, arichardson, ilovepi, cmtice, jh7370 Reviewed By: jh7370, arichardson, ilovepi, cmtice Pull Request: https://github.com/llvm/llvm-project/pull/158447
20 hoursPPC: Fix regression for 32-bit ppc with 64-bit support (#159893)Matt Arsenault2-1/+67
Fixes regression after e5bbaa9c8fb6e06dbcbd39404039cc5d31df4410. e5500 accidentally still had the 64bit feature applied instead of 64bit-support.
21 hours[TableGen][DecoderEmitter] Rework table construction/emission (#155889)Sergei Barannikov19-774/+967
### Current state We have FilterChooser class, which can be thought of as a **tree of encodings**. Tree nodes are instances of FilterChooser itself, and come in two types: * A node containing single encoding that has *constant* bits in the specified bit range, a.k.a. singleton node. * A node containing only child nodes, where each child represents a set of encodings that have the same *constant* bits in the specified bit range. Either of these nodes can have an additional child, which represents a set of encodings that have some *unknown* bits in the same bit range. As can be seen, the **data structure is very high level**. The encoding tree represented by FilterChooser is then converted into a finite-state machine (FSM), represented as **byte array**. The translation is straightforward: for each node of the tree we emit a sequence of opcodes that check encoding bits and predicates for each encoding. For a singleton node we also emit a terminal "decode" opcode. The translation is done in one go, and this has negative consequences: * We miss optimization opportunities. * We have to use "fixups" when encoding transitions in the FSM since we don't know the size of the data we want to jump over in advance. We have to emit the data first and then fix up the location of the jump. This means the fixup size has to be large enough to encode the longest jump, so **most of the transitions are encoded inefficiently**. * Finally, when converting the FSM into human readable form, we have to **decode the byte array we've just emitted**. This is also done in one go, so we **can't do any pretty printing**. ### This PR We introduce an intermediary data structure, decoder tree, that can be thought as **AST of the decoder program**. This data structure is **low level** and as such allows for optimization and analysis. It resolves all the issues listed above. We now can: * Emit more optimal opcode sequences. * Compute the size of the data to be emitted in advance, avoiding fixups. * Do pretty printing. Serialization is done by a new class, DecoderTableEmitter, which converts the AST into a FSM in **textual form**, streamed right into the output file. ### Results * The new approach immediately resulted in 12% total table size savings across all in-tree targets, without implementing any optimizations on the AST. Many tables observe ~20% size reduction. * The generated file is much more readable. * The implementation is arguably simpler and more straightforward (the diff is only +150~200 lines, which feels rather small for the benefits the change gives).
21 hours[CodeGen] Untangle RegisterCoalescer from LRE's ScannedRemattable flag [nfc[ ↵Philip Reames3-23/+11
(#159839) LiveRangeEdit's rematerialization checking logic is used in two quite different ways. For SplitKit and InlineSpiller, we're analyzing all defs associated with a live interval, doing that analysis up front, and then using the result a bit later. The RegisterCoalescer, we're analysing exactly one ValNo at a time, and using the legality result immediately. LRE had a checkRematerializable which existed basically to adapt the later into the former usage model. Instead, this change bypasses the ScannedRemat and Remattable structures, and directly queries the underlying routines. This is easy to read, and makes it more clear as to which uses actually need the deferred analysis. (A following change may try to unwind that too, but it's not strictly NFC.)
21 hoursRevert "[ELF][LLDB] Add an nvsass triple (#159459)" (#159879)Joseph Huber7-28/+9
Summary: This patch has broken the `libc` build bot. I could work around that but the changes seem unnecessary. This reverts commit 9ba844eb3a21d461c3adc7add7691a076c6992fc.
21 hours[MC] Make `-mcpu=native` test target specific (#159868)Cameron McInally2-4/+5
It's not possible to use `-mcpu=native` when the Target's Triple doesn't match the Host's. Move this test to the X86 directory so that it isn't run while cross-compiling. Originally #159414 --------- Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
22 hoursX86: Elide use of RegClassByHwMode in some ptr_rc_tailcall uses (#159874)Matt Arsenault2-4/+4
Different instructions are used for the 32-bit and 64-bit cases anyway, so directly use the concrete register class in the instruction.
22 hours[M68k] Remove STI from M68kAsmParser (#159827)Sergei Barannikov1-3/+2
STI exists in the base class, use it instead. Fixes #159862.
22 hours[TableGen] Remove unused Target from InstructionEncoding methods (NFC) (#159833)Sergei Barannikov3-32/+23
23 hoursReland [BasicBlockUtils] Handle funclets when detaching EH pad blocks (#159379)Gábor Spaits2-28/+305
Fixes #148052 . Last PR did not account for the scenario, when more than one instruction used the `catchpad` label. In that case I have deleted uses, which were already "choosen to be iterated over" by the early increment iterator. This issue was not visible in normal release build on x86, but luckily later on the address sanitizer build it has found it on the buildbot. Here is the diff from the last version of this PR: #158435 ```diff diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 91e245e5e8f5..1dd8cb4ee584 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs, // first block, the we would have possible cleanupret and catchret // instructions with poison arguments, which wouldn't be valid. if (isa<FuncletPadInst>(I)) { - for (User *User : make_early_inc_range(I.users())) { + SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete; + for (User *User : I.users()) { Instruction *ReturnInstr = dyn_cast<Instruction>(User); // If we have a cleanupret or catchret block, replace it with just an // unreachable. The other alternative, that may use a catchpad is a @@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs, if (isa<CatchReturnInst>(ReturnInstr) || isa<CleanupReturnInst>(ReturnInstr)) { BasicBlock *ReturnInstrBB = ReturnInstr->getParent(); - // This catchret or catchpad basic block is detached now. Let the - // successors know it. - // This basic block also may have some predecessors too. For - // example the following LLVM-IR is valid: - // - // [cleanuppad_block] - // | - // [regular_block] - // | - // [cleanupret_block] - // - // The IR after the cleanup will look like this: - // - // [cleanuppad_block] - // | - // [regular_block] - // | - // [unreachable] - // - // So regular_block will lead to an unreachable block, which is also - // valid. There is no need to replace regular_block with unreachable - // in this context now. - // On the other hand, the cleanupret/catchret block's successors - // need to know about the deletion of their predecessors. - emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs); + UniqueEHRetBlocksToDelete.insert(ReturnInstrBB); } } + for (BasicBlock *EHRetBB : + make_early_inc_range(UniqueEHRetBlocksToDelete)) + emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs); } } ```
23 hours[libc] Fix libc build on NVPTX using wrong linker flagJoseph Huber1-1/+9
Summary: Ugly hacks abound, we can't actually test linker flags correctly generically because not everyone has `nvlink` as a binary on their machine which would then result in every single flag being unsupported. This is the only 'linker flag' check we have, so just hard code it off.
24 hours[MCA] Enable customization of individual instructions (#155420)Roman Belenov11-21/+175
Currently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need. The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this: add (%eax), eax // LLVM-MCA-LATENCY:100 Users of MCA library can already make additional customizations; command line tool can be extended in the future. Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change. See https://github.com/llvm/llvm-project/issues/133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality) --------- Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
24 hours[SampleProfile] Always use FAM to get OREAiden Grossman1-14/+9
The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858
24 hours[RISCV] Update comments in RISCVMatInt to reflect we don't always use ADDIW ↵Craig Topper1-14/+15
after LUI now. NFC (#159829) The simm32 base case only uses lui+addiw when necessary after 3d2650bdeb8409563d917d8eef70b906323524ef The worst case 8 instruction sequence doesn't leave a full 32 bits for the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So we will never generate LUI+ADDIW in the worst case sequence.
24 hours[SROA] Use tree-structure merge to remove alloca (#152793)Chengjun3-7/+829
This patch introduces a new optimization in SROA that handles the pattern where multiple non-overlapping vector `store`s completely fill an `alloca`. The current approach to handle this pattern introduces many `.vecexpand` and `.vecblend` instructions, which can dramatically slow down compilation when dealing with large `alloca`s built from many small vector `store`s. For example, consider an `alloca` of type `<128 x float>` filled by 64 `store`s of `<2 x float>` each. The current implementation requires: - 64 `shufflevector`s( `.vecexpand`) - 64 `select`s ( `.vecblend` ) - All operations use masks of size 128 - These operations form a long dependency chain This kind of IR is both difficult to optimize and slow to compile, particularly impacting the `InstCombine` pass. This patch introduces a tree-structured merge approach that significantly reduces the number of operations and improves compilation performance. Key features: - Detects when vector `store`s completely fill an `alloca` without gaps - Ensures no loads occur in the middle of the store sequence - Uses a tree-based approach with `shufflevector`s to merge stored values - Reduces the number of intermediate operations compared to linear merging - Eliminates the long dependency chains that hurt optimization Example transformation: ``` // Before: (stores do not have to be in order) %alloca = alloca <8 x float> store <2 x float> %val0, ptr %alloca ; offset 0-1 store <2 x float> %val2, ptr %alloca+16 ; offset 4-5 store <2 x float> %val1, ptr %alloca+8 ; offset 2-3 store <2 x float> %val3, ptr %alloca+24 ; offset 6-7 %result = load <8 x float>, ptr %alloca // After (tree-structured merge): %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ``` Benefits: - Logarithmic depth (O(log n)) instead of linear dependency chains - Fewer total operations for large vectors - Better optimization opportunities for subsequent passes - Significant compilation time improvements for large vector patterns For some large cases, the compile time can be reduced from about 60s to less than 3s. --------- Co-authored-by: chengjunp <chengjunp@nividia.com>
24 hours[DependenceAnalysis] Extending SIV to handle fusable loops (#128782)Alireza Torabian3-206/+705
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
24 hours[lit] Add support for readfile to external shellAiden Grossman4-2/+56
This patch adds support for the new lit %{readfile:<filename>} substitution to the external shell. The implementation currently just appends some test commands to ensure the file exists and uses a subshell with cat. This is intended to enable running tests using the substitution in the external shell before we fully switch over to the internal shell. This code is designed to be temporary with us deleting it once everything has migrated over to the internal shell and we are able to remove the external shell code paths. Reviewers: petrhosek, cmtice, pogo59, ilovepi, arichardson Reviewed By: cmtice Pull Request: https://github.com/llvm/llvm-project/pull/159431
25 hours[AMDGPU] ds_read2/ds_write2 gfx1250 tests. NFC (#159824)Stanislav Mekhanoshin2-0/+1194
25 hours[CodeGenPrepare] Consider target memory intrinics as memory use (#159638)Jeffrey Byrnes2-11/+22
When deciding to sink address instructions into their uses, we check if it is profitable to do so. The profitability check is based on the types of uses of this address instruction -- if there are users which are not memory instructions, then do not fold. However, this profitability check wasn't considering target intrinsics, which may be loads / stores. This adds some logic to handle target memory intrinsics.
26 hoursRevert "[PowerPC] clean unused PPC target feature FeatureBPERMD" (#159837)Sergei Barannikov1-1/+4
Reverts llvm/llvm-project#159782 The PR breaks multiple build bots and CI as well.
26 hours[KnownBits] Add setAllConflict to set all bits in Zero and One. NFC (#159815)Craig Topper5-36/+28
This is a common pattern to initialize Knownbits that occurs before loops that call intersectWith.
26 hours[LV] Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.Florian Hahn2-56/+14
Pass operand info to getMemoryOpCost in getMemInstScalarizationCost. This matches the behavior in VPReplicateRecipe::computeCost.
27 hours[LV] Add additional test for replicating store costs.Florian Hahn2-30/+508
Add tests for costing replicating stores with x86_fp80, scalarizing costs after discarding interleave groups and cost when preferring vector addressing.
28 hours[AArch64] Clean up the formatting of some bitconvert patterns. NFCDavid Green1-145/+144
28 hours[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717)AZero1311-189/+119
Just do a custom lowering instead. Also copy paste the cmov-neg fold to prevent regressions in nabs.
28 hours[gn] port a513b701752b1Nico Weber2-2/+4
28 hours[QualGroup] Update Slides Section, Add AI Transcription Policy, Clean Up ↵Wendi4-22/+60
(#158842) This patch makes the following updates to the `QualGroup.rst` documentation: ✅ 1. Replace slide links with Google Drive URLs Replaced links to slide PDFs previously hosted in `llvm/docs/qual-wg/` with publicly accessible links to the same files stored on Google Drive. ✅ 2. Remove duplicated "Current Topics & Backlog" section Removed an accidental duplication of the "Current Topics & Backlog" section to improve clarity and structure. ✅ 3. Add "AI Transcription Policy" section Introduced a dedicated section documenting the group's practices and expectations regarding AI-based auto-transcription during sync-up meetings. Includes purpose, consent practices, retention details, and how participants can opt out or raise concerns. ✅ 4. Remove `qual-wg` subfolder from `docs` Removed the now-unused `llvm/docs/qual-wg` directory after migrating slide hosting off-repo. No longer needed for qualification group documentation. ✅ 5. Revision of the introduction Updated sentence to reflect the most current and widely relevant safety standards: adding IEC 61508 and IEC 62304 for broader applicability, and replacing EN 50128 (older standard in railways) by EN 50716 for correctness. --------- Co-authored-by: Wendi Urribarri (Woven by Toyota <wendi.urribarri@woven-planet.global>
28 hours[ELF][LLDB] Add an nvsass triple (#159459)Walter Erquinigo7-9/+28
When handling CUDA ELF files via objdump or LLDB, the ELF parser in LLVM needs to distinguish if an ELF file is sass or not, which requires a triple for sass to exist in llvm. This patch includes all the necessary changes for LLDB and objdump to correctly identify these files with the correct triple.
28 hours[LLVM] Simplify GPU runtimes flag handling (#159802)Joseph Huber1-6/+0
Summary: The AMDGPU hack can be removed, and we no longer need to skip 90% of the `HandleLLVMOptions` if we work around NVPTX earlier. Simplifies the interface by removing duplicated logic and keeps the GPU targets from being weirdly divergent on some flags.