Age | Commit message (Collapse) | Author | Files | Lines |
|
**KEY POINTS**
- MVP workflow to automatically lint C++ files, located **only** in
`clang-tools-extra/clang-tidy`. It's chosen this way as
`clang-tools-extra/clang-tidy` is almost 100% `clang-tidy` complaint,
thus we would automatically enforce a [high quality standard for
clang-tidy source
code](https://discourse.llvm.org/t/rfc-create-hardened-clang-tidy-config-for-clang-tidy-directory/87247).
(https://github.com/llvm/llvm-project/pull/147793)
- Implementation is very similar to code-format job, but without the
ability to run `code-lint-helper.py` locally.
**FOUND ISSUES** + open questions
- Speed: it takes ~ 1m40sec to download and unpack `clang-tidy` plus
additional ~4 mins to configure and CodeGen targets. I see that
`premerge.yaml` runs on special `llvm-premerge-linux-runners` runners
which can use `sccache` for speed. Can we use the same runners for this
job? Exact timings can be found
[here](https://github.com/llvm/llvm-project/actions/runs/17135749067/job/48611150680?pr=154223).
**TO DO**
- Place common components from `code-lint-helper.py` and
`code-format-helper.py` into a separate file and reuse it in both CI's.
- Compute CodeGen targets based on `projects_to_build`, for now
`clang-tablegen-targets` is hardcoded for `clang-tools-extra/`.
- Automatically generate and upload `.yaml` for
`clang-apply-replacements`
- Create an RFC with a plan how to enable `clang-tidy` in other projects
so that Maintainers of LLVM components could choose if they want
`clang-tidy` or not.
- Add more linters like `pylint`, `ruff` in the future.
|
|
|
|
Remove dce,instcombine from run lines from test to make it easier to
check the output generated by LV.
|
|
Add additional tests benefiting from rewriting existing SCEVAddExprs with
guards.
|
|
Follow-up to https://reviews.llvm.org/D46527
|
|
combine() combines hash values with recursion on variadic parameters.
This patch replaces the recursion with a C++17 fold expression:
(combine_data(length, buffer_ptr, buffer_end, get_hashable_data(args)),
...);
which expands to:
combine_data(length, buffer_ptr, buffer_end, get_hashable_data(a));
combine_data(length, buffer_ptr, buffer_end, get_hashable_data(b));
combine_data(length, buffer_ptr, buffer_end, get_hashable_data(c));
:
A key benefit of this change is the unification of the recursive step
and the base case. The argument processing and finalization logic now
exist as straight-line code within a single function.
combine_data now takes buffer_ptr by reference. This is necessary
because the previous assignment pattern:
buffer_ptr = combine_data(...)
is syntactically incompatible with a fold expression. The new pattern:
(combine_data(...), ...)
discards return values, so combine_data must update buffer_ptr
directly.
For readability, this patch does the bare minimum to use a fold
expression, leaving further cleanups to subsequent patches. For
example, buffer_ptr and buffer_end could become member variables, and
several comments that mention recursion still need updating.
|
|
|
|
This patch simplifies dispatchRecalculateHash and dispatchResetHash
with "constexpr if".
This patch does not inline dispatchRecalculateHash and
dispatchResetHash into their respective call sites. Using "constexpr
if" in a non-template context like MDNode::uniquify would still
require the discarded branch to be syntactically valid, causing a
compilation error for node types that do not have
recalculateHash/setHash. Using template functions ensures that the
"constexpr if" is evaluated in a proper template context, allowing the
compiler to fully discard the inactive branch.
|
|
This patch modernizes HasCachedHash.
- "struct SFINAE" is replaced with identically defined SameType.
- The return types Yes and No are replaced with std::true_type and
std::false_type.
My previous attempt (#159510) to clean up HasCachedHash failed on
clang++-18, but this version works with clang++-18.
|
|
This patch moves IsSizeLessThanThresholdT into AdjustedParamTBase, the
sole user of the helper, while switching to a type alias.
Aside from moving the helper closer to where it's used, another
benefit is that we can assume that T is a complete type inside
AdjustedParamTBase. Note that sizeof(T) serves as a check for a
complete type. Inside AdjustedParamTBase, we only pass complete
non-void types to:
std::is_trivially_copy_constructible<T>
std::is_trivially_move_constructible<T>
so we can safely drop the fallback case implemented with
std::false_type.
|
|
Summary:
This somehow snuck back in.
|
|
There are a couple of places during function cloning where we may create
new callsite clone nodes. One of those places was correctly propagating
the assignment to which function clone it should call, and one was not.
Refactor this handling into a helper and use in both places so the newly
created callsite clones actually call the assigned callee function
clones.
|
|
I forgot to remove a bunch of the intermediary path.
That's what I get for not waiting my local build to finish.
Fixes: 47c1b650626043f0a8f8e32851617201751f9439
|
|
|
|
|
|
comparisons with `ConstantFPRange` (#159315)
Follow up of #158097
Similar to `simplifyAndOrOfICmpsWithConstants`, we can do so for
floating point comparisons.
|
|
In addition to existing C/C++ tests, add Fortran-based tests. Fortran
tests will only run if a Fortran compiler is found. The first test is
for the unroll construct added in #144785.
|
|
During development I introduced the `%llvm_obj_root` substitution but later removed it as a better
solution became apparent. Revert this to the original substitution while keeping the new path.
Fixes: 4e1c996674cc340f290b0a528e2038e76494d8d4
|
|
Alive2: https://alive2.llvm.org/ce/z/8rX5Rk
Closes https://github.com/llvm/llvm-project/issues/118106.
|
|
|
|
Before this change, rm would assume that a symlink to a directory was
actually a directory and require the recursive flag to be passed,
differing from other shells. Given the change in lit is about the same
length as the test change would be (minus tests), I think it makes sense
to just support this in the internal shell.
Reviewers: cmtice, petrhosek, ilovepi
Reviewed By: petrhosek, cmtice, ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/158464
|
|
cat with no files passed to it is supposed to read from STDIN according
to POSIX. The builtin cat lacking this behavior led to the clang test in
dev-fd-fs.c to fail because it expected this behavior. This is a simple
modification and I do not think it is possible to rewrite the test
without this easily while preserving the semantics around named pipes.
Reviewers: petrhosek, arichardson, ilovepi, cmtice, jh7370
Reviewed By: jh7370, arichardson, ilovepi, cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/158447
|
|
Fixes regression after e5bbaa9c8fb6e06dbcbd39404039cc5d31df4410.
e5500 accidentally still had the 64bit feature applied instead of
64bit-support.
|
|
### Current state
We have FilterChooser class, which can be thought of as a **tree of
encodings**. Tree nodes are instances of FilterChooser itself, and come
in two types:
* A node containing single encoding that has *constant* bits in the
specified bit range, a.k.a. singleton node.
* A node containing only child nodes, where each child represents a set
of encodings that have the same *constant* bits in the specified bit
range.
Either of these nodes can have an additional child, which represents a
set of encodings that have some *unknown* bits in the same bit range.
As can be seen, the **data structure is very high level**.
The encoding tree represented by FilterChooser is then converted into a
finite-state machine (FSM), represented as **byte array**. The
translation is straightforward: for each node of the tree we emit a
sequence of opcodes that check encoding bits and predicates for each
encoding. For a singleton node we also emit a terminal "decode" opcode.
The translation is done in one go, and this has negative consequences:
* We miss optimization opportunities.
* We have to use "fixups" when encoding transitions in the FSM since we
don't know the size of the data we want to jump over in advance. We have
to emit the data first and then fix up the location of the jump. This
means the fixup size has to be large enough to encode the longest jump,
so **most of the transitions are encoded inefficiently**.
* Finally, when converting the FSM into human readable form, we have to
**decode the byte array we've just emitted**. This is also done in one
go, so we **can't do any pretty printing**.
### This PR
We introduce an intermediary data structure, decoder tree, that can be
thought as **AST of the decoder program**.
This data structure is **low level** and as such allows for optimization
and analysis.
It resolves all the issues listed above. We now can:
* Emit more optimal opcode sequences.
* Compute the size of the data to be emitted in advance, avoiding
fixups.
* Do pretty printing.
Serialization is done by a new class, DecoderTableEmitter, which
converts the AST into a FSM in **textual form**, streamed right into the
output file.
### Results
* The new approach immediately resulted in 12% total table size savings
across all in-tree targets, without implementing any optimizations on
the AST. Many tables observe ~20% size reduction.
* The generated file is much more readable.
* The implementation is arguably simpler and more straightforward (the
diff is only +150~200 lines, which feels rather small for the benefits
the change gives).
|
|
(#159839)
LiveRangeEdit's rematerialization checking logic is used in two quite
different ways. For SplitKit and InlineSpiller, we're analyzing all defs
associated with a live interval, doing that analysis up front, and then
using the result a bit later. The RegisterCoalescer, we're analysing
exactly one ValNo at a time, and using the legality result immediately.
LRE had a checkRematerializable which existed basically to adapt the
later into the former usage model.
Instead, this change bypasses the ScannedRemat and Remattable
structures, and directly queries the underlying routines. This is easy
to read, and makes it more clear as to which uses actually need the
deferred analysis. (A following change may try to unwind that too, but
it's not strictly NFC.)
|
|
Summary:
This patch has broken the `libc` build bot. I could work around that but
the changes seem unnecessary.
This reverts commit 9ba844eb3a21d461c3adc7add7691a076c6992fc.
|
|
It's not possible to use `-mcpu=native` when the Target's Triple doesn't
match the Host's. Move this test to the X86 directory so that it isn't
run while cross-compiling.
Originally #159414
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
|
|
Different instructions are used for the 32-bit and 64-bit cases
anyway, so directly use the concrete register class in the
instruction.
|
|
STI exists in the base class, use it instead.
Fixes #159862.
|
|
|
|
Fixes #148052 .
Last PR did not account for the scenario, when more than one instruction
used the `catchpad` label.
In that case I have deleted uses, which were already "choosen to be
iterated over" by the early increment iterator. This issue was not
visible in normal release build on x86, but luckily later on the address
sanitizer build it has found it on the buildbot.
Here is the diff from the last version of this PR: #158435
```diff
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 91e245e5e8f5..1dd8cb4ee584 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
// first block, the we would have possible cleanupret and catchret
// instructions with poison arguments, which wouldn't be valid.
if (isa<FuncletPadInst>(I)) {
- for (User *User : make_early_inc_range(I.users())) {
+ SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete;
+ for (User *User : I.users()) {
Instruction *ReturnInstr = dyn_cast<Instruction>(User);
// If we have a cleanupret or catchret block, replace it with just an
// unreachable. The other alternative, that may use a catchpad is a
@@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
if (isa<CatchReturnInst>(ReturnInstr) ||
isa<CleanupReturnInst>(ReturnInstr)) {
BasicBlock *ReturnInstrBB = ReturnInstr->getParent();
- // This catchret or catchpad basic block is detached now. Let the
- // successors know it.
- // This basic block also may have some predecessors too. For
- // example the following LLVM-IR is valid:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [cleanupret_block]
- //
- // The IR after the cleanup will look like this:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [unreachable]
- //
- // So regular_block will lead to an unreachable block, which is also
- // valid. There is no need to replace regular_block with unreachable
- // in this context now.
- // On the other hand, the cleanupret/catchret block's successors
- // need to know about the deletion of their predecessors.
- emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs);
+ UniqueEHRetBlocksToDelete.insert(ReturnInstrBB);
}
}
+ for (BasicBlock *EHRetBB :
+ make_early_inc_range(UniqueEHRetBlocksToDelete))
+ emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs);
}
}
```
|
|
Summary:
Ugly hacks abound, we can't actually test linker flags correctly
generically because not everyone has `nvlink` as a binary on their
machine which would then result in every single flag being unsupported.
This is the only 'linker flag' check we have, so just hard code it off.
|
|
Currently MCA takes instruction properties from scheduling model.
However, some instructions may execute differently depending on external
factors - for example, latency of memory instructions may vary
differently depending on whether the load comes from L1 cache, L2 or
DRAM. While MCA as a static analysis tool cannot model such differences
(and currently takes some static decision, e.g. all memory ops are
treated as L1 accesses), it makes sense to allow manual modification of
instruction properties to model different behavior (e.g. sensitivity of
code performance to cache misses in particular load instruction). This
patch addresses this need.
The library modification is intentionally generic - arbitrary
modifications to InstrDesc are allowed. The tool support is currently
limited to changing instruction latencies (single number applies to all
output arguments and MaxLatency) via coments in the input assembler
code; the format is the like this:
add (%eax), eax // LLVM-MCA-LATENCY:100
Users of MCA library can already make additional customizations; command
line tool can be extended in the future.
Note that InstructionView currently shows per-instruction information
according to scheduling model and is not affected by this change.
See https://github.com/llvm/llvm-project/issues/133429 for additional
clarifications (including explanation why existing customization
mechanisms do not provide required functionality)
---------
Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
|
|
The split in this code path was left over from when we had to support
the old PM and the new PM at the same time. Now that the legacy pass has
been dropped, this simplifies the code a little bit and swaps pointers
for references in a couple places.
Reviewers: aeubanks, efriedma-quic, wlei-llvm
Reviewed By: aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/159858
|
|
after LUI now. NFC (#159829)
The simm32 base case only uses lui+addiw when necessary after
3d2650bdeb8409563d917d8eef70b906323524ef
The worst case 8 instruction sequence doesn't leave a full 32 bits for
the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So
we will never generate LUI+ADDIW in the worst case sequence.
|
|
This patch introduces a new optimization in SROA that handles the
pattern where multiple non-overlapping vector `store`s completely fill
an `alloca`.
The current approach to handle this pattern introduces many `.vecexpand`
and `.vecblend` instructions, which can dramatically slow down
compilation when dealing with large `alloca`s built from many small
vector `store`s. For example, consider an `alloca` of type `<128 x
float>` filled by 64 `store`s of `<2 x float>` each. The current
implementation requires:
- 64 `shufflevector`s( `.vecexpand`)
- 64 `select`s ( `.vecblend` )
- All operations use masks of size 128
- These operations form a long dependency chain
This kind of IR is both difficult to optimize and slow to compile,
particularly impacting the `InstCombine` pass.
This patch introduces a tree-structured merge approach that
significantly reduces the number of operations and improves compilation
performance.
Key features:
- Detects when vector `store`s completely fill an `alloca` without gaps
- Ensures no loads occur in the middle of the store sequence
- Uses a tree-based approach with `shufflevector`s to merge stored
values
- Reduces the number of intermediate operations compared to linear
merging
- Eliminates the long dependency chains that hurt optimization
Example transformation:
```
// Before: (stores do not have to be in order)
%alloca = alloca <8 x float>
store <2 x float> %val0, ptr %alloca ; offset 0-1
store <2 x float> %val2, ptr %alloca+16 ; offset 4-5
store <2 x float> %val1, ptr %alloca+8 ; offset 2-3
store <2 x float> %val3, ptr %alloca+24 ; offset 6-7
%result = load <8 x float>, ptr %alloca
// After (tree-structured merge):
%shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
Benefits:
- Logarithmic depth (O(log n)) instead of linear dependency chains
- Fewer total operations for large vectors
- Better optimization opportunities for subsequent passes
- Significant compilation time improvements for large vector patterns
For some large cases, the compile time can be reduced from about 60s to
less than 3s.
---------
Co-authored-by: chengjunp <chengjunp@nividia.com>
|
|
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
|
|
This patch adds support for the new lit %{readfile:<filename>}
substitution to the external shell. The implementation currently just
appends some test commands to ensure the file exists and uses a subshell
with cat. This is intended to enable running tests using the
substitution in the external shell before we fully switch over to the
internal shell.
This code is designed to be temporary with us deleting it once
everything has migrated over to the internal shell and we are able to
remove the external shell code paths.
Reviewers: petrhosek, cmtice, pogo59, ilovepi, arichardson
Reviewed By: cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/159431
|
|
|
|
When deciding to sink address instructions into their uses, we check if
it is profitable to do so. The profitability check is based on the types
of uses of this address instruction -- if there are users which are not
memory instructions, then do not fold.
However, this profitability check wasn't considering target intrinsics,
which may be loads / stores.
This adds some logic to handle target memory intrinsics.
|
|
Reverts llvm/llvm-project#159782
The PR breaks multiple build bots and CI as well.
|
|
This is a common pattern to initialize Knownbits that occurs before
loops that call intersectWith.
|
|
Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.
This matches the behavior in VPReplicateRecipe::computeCost.
|
|
Add tests for costing replicating stores with x86_fp80, scalarizing
costs after discarding interleave groups and cost when preferring vector
addressing.
|
|
|
|
Just do a custom lowering instead.
Also copy paste the cmov-neg fold to prevent regressions in nabs.
|
|
|
|
(#158842)
This patch makes the following updates to the `QualGroup.rst`
documentation:
✅ 1. Replace slide links with Google Drive URLs
Replaced links to slide PDFs previously hosted in `llvm/docs/qual-wg/`
with publicly accessible links to the same files stored on Google Drive.
✅ 2. Remove duplicated "Current Topics & Backlog" section
Removed an accidental duplication of the "Current Topics & Backlog"
section to improve clarity and structure.
✅ 3. Add "AI Transcription Policy" section
Introduced a dedicated section documenting the group's practices and
expectations regarding AI-based auto-transcription during sync-up
meetings. Includes purpose, consent practices, retention details, and
how participants can opt out or raise concerns.
✅ 4. Remove `qual-wg` subfolder from `docs`
Removed the now-unused `llvm/docs/qual-wg` directory after migrating
slide hosting off-repo. No longer needed for qualification group
documentation.
✅ 5. Revision of the introduction
Updated sentence to reflect the most current and widely relevant safety
standards: adding IEC 61508 and IEC 62304 for broader applicability, and
replacing EN 50128 (older standard in railways) by EN 50716 for
correctness.
---------
Co-authored-by: Wendi Urribarri (Woven by Toyota <wendi.urribarri@woven-planet.global>
|
|
When handling CUDA ELF files via objdump or LLDB, the ELF parser in LLVM
needs to distinguish if an ELF file is sass or not, which requires a
triple for sass to exist in llvm. This patch includes all the necessary
changes for LLDB and objdump to correctly identify these files with the
correct triple.
|
|
Summary:
The AMDGPU hack can be removed, and we no longer need to skip 90% of the
`HandleLLVMOptions` if we work around NVPTX earlier. Simplifies the
interface by removing duplicated logic and keeps the GPU targets from
being weirdly divergent on some flags.
|