Age | Commit message (Collapse) | Author | Files | Lines |
|
While we do not want to form actual lookup tables early, we do want to
perform some optimizations, as they may enable inlining of the much
simpler form.
Builds on https://github.com/llvm/llvm-project/pull/156477, which
originally included this change as well. This PR makes two changes on
top of it:
* Do not perform the optimization early if it requires adding a mask
check. These make the resulting IR less analyzable.
* Add a new SimplifyCFG option that controls switch-to-arithmetic
conversion separately from switch-to-lookup conversion. Enable the new
flag at the end of the function simplification pipeline. This means that
we attempt the arithmetic conversion before inlining, but avoid it in
the early pipeline, where it may lose information.
|
|
Introduce `AllocToken`, an instrumentation pass designed to provide
tokens to memory allocators enabling various heap organization
strategies, such as heap partitioning.
Initially, the pass instruments functions marked with a new attribute
`sanitize_alloc_token` by rewriting allocation calls to include a token
ID, appended as a function argument with the default ABI.
The design aims to provide a flexible framework for implementing
different token generation schemes. It currently supports the following
token modes:
- TypeHash (default): token IDs based on a hash of the allocated type
- Random: statically-assigned pseudo-random token IDs
- Increment: incrementing token IDs per TU
For the `TypeHash` mode introduce support for `!alloc_token` metadata:
the metadata can be attached to allocation calls to provide richer
semantic
information to be consumed by the AllocToken pass. Optimization remarks
can be enabled to show where no metadata was available.
An alternative "fast ABI" is provided, where instead of passing the
token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the
token ID is directly encoded into the name of the called function (e.g.,
`__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this
offers more efficient instrumentation by avoiding the overhead of
passing an additional argument at each allocation site.
Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1]
---
This change is part of the following series:
1. https://github.com/llvm/llvm-project/pull/160131
2. https://github.com/llvm/llvm-project/pull/156838
3. https://github.com/llvm/llvm-project/pull/162098
4. https://github.com/llvm/llvm-project/pull/162099
5. https://github.com/llvm/llvm-project/pull/156839
6. https://github.com/llvm/llvm-project/pull/156840
7. https://github.com/llvm/llvm-project/pull/156841
8. https://github.com/llvm/llvm-project/pull/156842
|
|
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
|
|
Initially this was needed to replace the fixed-step canonical IV with
the variable-step EVL IV, but this was eventually superseded by the loop
vectorizer doing this transform itself in #147222. The pass was then
removed from the RISC-V pipeline in #151483 and the loop vectorizer
stopped emitting the metadata used by the pass in #155760, so now
there's no users of it.
|
|
This is a very rough state of what this can look like, but I didn't want
to spend too much time on what could be a dead end.
Currently the only way to invoke callbacks is by using the default
pipelines, this is an issue if you want to define your own pipeline
using the C string API (we do that in LLVM.jl in julia) so I extended
the api to allow for invoking those callbacks just like one would call a
pass of that kind.
There are some questions about the params that these callbacks take and
also I'm missing some of them (some of them are also invoked by the
backend so we may not want to expose them)
Code written with AI help, bugs are mine. (Not sure what policy for this
is on LLVM)
|
|
In this commit:
(1) Added new pass manager support for `ReachingDefAnalysis`.
(2) Added printer pass.
(3) Make old pass manager use `ReachingDefInfoWrapperPass`
|
|
This adds a new pass for dropping assumes that are unlikely to be useful
for further optimization.
It works by discarding any assumes whose affected values are one-use
(which implies that they are only used by the assume, i.e. ephemeral).
This pass currently runs at the start of the module optimization
pipeline, that is post-inline and post-link. Before that point, it is
more likely for previously "useless" assumes to become useful again,
e.g. because an additional user of the value is introduced after
inlining + CSE.
|
|
(#159516)
No loop pass seems to use now it after LoopPredication stopped using it
in https://reviews.llvm.org/D111668
|
|
Align the syntax used for the optimization level argument of the
expand-fp pass in textual descriptions of pass pipelines with the syntax
used by other passes taking a similar argument. That is, use e.g.
`expand-fp<O1>` instead of `expand-fp<opt-level=1>`.
|
|
This patch implements a correctly rounded expansion of the frem
instruction in LLVM IR. This is useful for target architectures for
which such an expansion is too involved to be implement in ISel
Lowering. The expansion is based on the code from the AMD device libs
and has been tested successfully against the OpenCL conformance tests on
amdgpu. The expansion is implemented in the preexisting "expand-fp"
pass. It replaces the expansion of "frem" in ISel for the amdgpu target;
it is enabled for targets which do not directly support "frem" and for
which no matching "fmod" LibCall is available.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
|
|
This patch introduces `SCEVDivisionPrinterPass` and registers it under
the name `print<scev-division>`, primarily for testing purposes. This
pass invokes `SCEVDivision::divide` upon encountering `sdiv`, and prints
the numerator, denominator, quotient, and remainder. It also adds
several test cases, some of which are currently incorrect and require
fixing.
Along with that, this patch added some comments to clarify the behavior
of `SCEVDivision::divide`, as follows:
- This function does NOT actually perform the division
- Given the `Numerator` and `Denominator`, find a pair
`(Quotient, Remainder)` s.t.
`Numerator = Quotient * Denominator + Remainder`
- The common condition `Remainder < Denominator` is NOT necessarily
required
- There may be multiple solutions for `(Quotient, Remainder)`, and this
function finds one of them
- Especially, there is always a trivial solution `(0, Numerator)`
- The following computations may wrap
- The multiplication of `Quotient` and `Denominator`
- The addition of `Quotient * Denominator` and `Remainder`
Related discussion: #154745
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
Reverts llvm/llvm-project#148758
[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
|
|
Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`)
Tracking issue: #147390
|
|
This allows for unit testing of finalizeBundle with standard MIR tests
using update_mir_test_checks.py.
|
|
Enabling one of MemorySSA or MD implies the other is off.
Already approved in https://github.com/llvm/llvm-project/pull/149473 but
I had to revert as I missed updating one test.
|
|
This reverts commit 60d2d94db253a9fdc7bd111120c803f808564b30.
|
|
Enabling one of MemorySSA or MD implies the other is off.
|
|
same as https://github.com/llvm/llvm-project/pull/138829
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|
|
same as https://github.com/llvm/llvm-project/pull/138828,
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|
|
|
|
|
|
This will be useful for testing the set of calls for different systems,
and eventually the product of context specific modifiers applied. In
the future we should also know the type signatures, and be able to
emit the correct one.
|
|
This allows `machine-function(p1,machine-function(...))` instead of
erroring.
Effectively it is flattened to a single MFPM.
|
|
Expose the FatLTO pipeline via `-passes="fatlto-pre-link<Ox>"`, similar
to all the other optimization pipelines. This is to allow reproducing it
outside clang. (Possibly also useful for C API users.)
|
|
|
|
Pipelines like `-passes="default<O3>"` are currently parsed in a special
way. Switch them to work like normal, parameterized module passes.
|
|
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
Introduce a fresh analysis for recognizing polynomial hashes, with the
rationale that several targets have specific instructions to optimize
things like CRC and GHASH (eg. X86 and RISC-V crypto extension). We
limit the scope to polynomial hashes computed in a Galois field of
characteristic 2, since this class of operations can also be optimized
in the absence of target-specific instructions to use a lookup table.
At the moment, we only recognize the CRC algorithm.
RFC:
https://discourse.llvm.org/t/rfc-new-analysis-for-polynomial-hash-recognition/86268
|
|
- No need to prefix `PointerType` with `llvm::`.
- Avoid namespace block to define `PrintPipelinePasses`.
|
|
|
|
|
|
This PR introduces IR2Vec as an analysis pass. The changes include:
- Logic for generating Symbolic encodings.
- 75D learned vocabulary.
- lit tests.
Here is the link to the RFC -
https://discourse.llvm.org/t/rfc-enhancing-mlgo-inlining-with-ir2vec-embeddings
Acknowledgements: contributors -
https://github.com/IITH-Compilers/IR2Vec/graphs/contributors
---------
Co-authored-by: svkeerthy <venkatakeerthy@google.com>
Co-authored-by: Mircea Trofin <mtrofin@google.com>
|
|
(#131005)
When we enable EVL-based loop vectorization w/ predicated tail-folding,
each vectorized loop has effectively two induction variables: one
calculates the step using (VF x vscale) and the other one increases the
IV by values returned from experiment.get.vector.length. The former,
also known as canonical IV, is more favorable for analyses as it's
"countable" in the sense of SCEV; the latter (EVL-based IV), however, is
more favorable to codegen, at least for those that support scalable
vectors like AArch64 SVE and RISC-V.
The idea is that we use canonical IV all the way until the end of all
vectorizers, where we replace it with EVL-based IV using EVLIVSimplify
introduced here. Such that we can have the best from both worlds.
This Pass is enabled by default in RISC-V. However, since we haven't
really vectorize loops with predicate tail-folding by default, this Pass
is no-op at this moment.
|
|
This adds a GISelValueTrackingPrinterPass that can print the known bits
and sign bit of each def in a function. It is built on the new pass
manager and so adds a NPM GISelValueTrackingAnalysis, renaming the older
class to GISelValueTrackingAnalysisLegacy.
The first 2 functions from the AArch64GISelMITest are ported over to an
mir test to show it working. It also runs successfully on all files in
llvm/test/CodeGen/AArch64/GlobalISel/*.mir that are not invalid. It can
hopefully be used to test GlobalISel known bits analysis more directly
in common cases, without jumping through the hoops that the C++ tests
requires.
|
|
/llvm-project/llvm/lib/Passes/PassBuilder.cpp:1508:2:
error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
1508 | };
| ^
1 error generated.
|
|
|
|
|
|
- Add new pass manager version of `MachineUniformityAnalysis `.
- Query `TargetTransformInfo` in new pass manager version.
- Use `printAsOperand` when printing machine function name
|
|
|
|
|
|
|
|
Didn't find a test for this (but there are tests for the `Function`
version of this pass)
|
|
|
|
This completes the PreEmitPasses.
|
|
|
|
Not sure why the "fold-all" option naming didn't match the
variable "FoldPreOutputs", but I've preserved the difference.
More annoyingly, the pass name "normalize" does not match the pass
name IRNormalizer and should probably be fixed one way or the other.
Also the existing test coverage for the flags is lacking. I've added
a test that shows they parse, but we should have tests that they
do something.
|
|
|